Mathematics for
Engineers and
Scientists
Alan Jeffrey
Mathematics for
Applications of Mathematics Series
Editor: Alan Jeffrey
Professor of Engineering Mathematics
University of Newcastle-upon-Tyne
William F. Ames
Numerical Methods for Partial Differential Equations
T. J. M. Boyd and J. J. Sanderson
Plasma Dynamics
C. D. Green
Integral Equation Methods
I. H. Hall
Deformation of Solids
Jeremy Hirschhorn
Dynamics of Machinery
Alan Jeffrey
Mathematics for Engineers and Scientists
Brian Porter
Synthesis of Dynamical Systems
Engineers
and Scientists
Alan Jeffrey
University of Newcastle-upon-Tyne
NELSON
£
Thomas Nelson and Sons Ltd
36 Park Street London W1Y 4DE
Nelson (Africa) Ltd
PO Box 18123 Nairobi Kenya
Thomas Nelson (Australia) Ltd
171-175 Bank Street South Melbourne Victoria 3205
Thomas Nelson and Sons (Canada) Ltd
81 Curlew Drive Don Mills Ontario
Thomas Nelson (Nigeria) Ltd
PO Box 336 Apapa Lagos
First published in Great Britain by Thomas Nelson and Sons Ltd., 1969
Reprinted with amendments 1971
Reprinted 1973
Copyright © Alan Jeffrey 1969, 1971
\11 Rights Reserved. No part of this publication may be reproduced,
ored in a retrieval system, or/tfansmitted, in any form or by
V means, electronic, mechanical, photocopying, recording or otherwise,
out the prior permission of the publishers.
. 761605 9 (Boards)
17 771604 5 (Paper)
Reproduced and printed by photolithography and bound in
Great Britain at The Pitman Press, Bath
5\0-^*°
Preface
This book has evolved from an introductory course in mathematics given to
engineering students at the University of Newcastle-upon-Tyne during the
last few years. It represents the author's attempt to offer the engineering
student, and the science student who is not majoring in a mathematical
aspect of his subject, a broad and modern account of those parts of mathe-
matics that are finding increasingly important application in the everyday
development of his subject.
Although this book does not seek to teach any of the many physical
disciplines to which its results and methods may be applied, it nevertheless
makes free use of them for purposes of illustration whenever this seems to be
helpful. Every effort has been made to integrate the various chapters into a
description of mathematics as a single subject, and not as a collection of
seemingly unrelated topics. Thus, for example, matrices are not only intro-
duced in an algebraic context, but they are also related in other chapters to
change of variables in partial differentiation and to the study of simultaneous
differential equations.
Modern notation and terminology have been used freely but, it is hoped,
never to the point of becoming pedantic when a simple word or phrase seems
more natural. Of necessity, much of the material in this book is standard,
though the emphasis and manner of introduction and presentation frequently
differs from that found elsewhere. This is deliberate, and is a reflection of
the changing importance of mathematical topics in engineering and science
to-day.
In many introductory mathematics texts for engineering and science
students no serious attempt is made to offer reasonable proofs of main
results and, instead, attention is largely confined to their manipulation.
Important though this aspect undoubtedly is, it is the author's belief that
knowledge of the proof of a result is often as essential as its subsequent
application, and that the modern student needs and merits both. With this
thought in mind proofs of results have always been included, and, though
they have been kept as simple as possible, no attempt has been made to
conceal difficulty where it exists. Only very occasionally, when the proof of
a result is lengthy, and its details are largely irrelevant to the subsequent
development of the argument, has the treatment been shortened to a summary
of the logical steps involved. Even then the interested reader can often find
more relevant information amongst the specially selected problems at the
end of each chapter.
As implied by the previous remark, the many problems not only comprise
those offering manipulative exercise, but also those shedding further light
vi / PREFACE
on topics only touched upon in the main text. No serious student can progress
in his knowledge of this subject without a proper investment of time and effort
spent working at a selection of these problems. The main text is provided with
numerous illustrative examples designed to be helpful both when working
through the text and when attempting the classified problems. It is hoped
that their inclusion also makes the book suitable for private study.
The wide range of material covered in this book represents rather more
than would normally be contained in an introductory course of lectures.
Whilst allowing for changing approaches in teaching, this fact also permits
some flexibility in use of the material and at the same time offers further
relevant reading to the ambitious student. In addition to the author's own
experience of the application of mathematics in engineering and science, the
choice and style of presentation of material has been influenced by two
recently published documents: the Council of Engineering Institutions
syllabuses in mathematics in Britain and the CUPM recommendations made
by the Mathematical Association of America. It is the author's hope that
this book complies fully with the former document and with the spirit of the
latter insofar as its recommendations are applicable to engineering and
science students.
The material has all been class-tested and, as a result, has undergone
considerable modification from its first appearance as lecture notes to the
form of presentation adopted here. It is a pleasure to acknowledge the help
of the publishers who have given me continued encouragement and every
possible form of assistance throughout the entire period of preparation of
the book.
,A. J.
As a direct result of requests by users of the first printing of this book it
was decided that a short chapter on Fourier Series should be added. The
present revised imprint contains this new material and also incorporates a
number of small corrections drawn to the author's attention by various kind
readers.
A. J.
Contents
1 Introduction to Sets and Numbers 1
V} Sets and algebra / 1-2 Set theory and probability 9 1-3
itegers, rationals and arithmetic laws 21 14 Absolute value of a real
number 28 15 Representation of numbers 29 w l-6 Mathemati-
cal induction 3/ ^Problems 35
2 Variables, Functions, and Mappings 41
2-1 Variables and functions 41 2;2 ' Inverse functions 48 %J>
'Some special functions 54 2-4 Digression on mappings 58 2-5
Curves and parameters 61 2& Functions of several real variables 64
Problems 67
3 Sequences, Limits, and Continuity 73
31 Sequences 73 3-2 Limits of sequences 79 3-3 The number
e 86 3-4 Limits of functions — continuity 89 3-5 Functions of
several variables — limits, continuity 98 3-6 A useful connecting
theorem 102 Problems 105
,4 Complex Numbers and Vectors 115
41 Introductory ideas 115 4-2 Basic algebraic rules for complex
numbers 118 4-3 Complex numbers as vectors 123 4-4 Modu-
lus-argument form of complex numbers 128 4-5 Roots of complex
numbers 132 4-6 Introduction to space vectors 134 4-7 Scalar
and vector products 147 4-8 GeoVnetrical applications 157 ■ 4-9
Applications to mechanics 163 Problems 167
5 Differentiation of Functions of One or More Real
Variables 178
5-1 The derivative 178 5-2 Rules of differentiation 189 5-3
Some important consequences of differentiability 797 54 Higher
derivatives — applications 216 5-5 Partial differentiation 222 5-6
Total differential 228 5-7 Envelopes 234 5-8 The chain rule
and its consequences 239 5-9 Change of variable 243 5-10 Im-
plicit functions 248 511 Higher order partial derivatives 253 Prob-
lems 257
viii / CONTENTS
6 Exponential, Hyperbolic,
and Logarithmic Functions 270
6-1 The exponential function 270 6 2 Differentiation of functions
involving the exponential function 277 6-3 The logarithmic function
281 6-4 Hyperbolic functions 287 6-5 Exponential function with
a complex argument 293 Problems 296
7 Fundamentals of Integration 302
7-1 Definite integrals and areas 302 7-2 Integration of arbitrary
continuous functions 311 7-3 Integral inequalities 319 7-4 The
definite integral as a function of its upper limit-indefinite integral 320
7-5 Differentiation of an integral containing a parameter 324 7-6
Other geometrical applications of definite integrals 326 1-1 Numerical
integration 332 Problems 337
8 Systematic Integration 345
8-1 Integration of elementary functions 345 8-2 Integration by
substitution 348 8-3 Integration by parts 355 8-4 Reduction for-
mulae 358 8-5 Integration of rational functions-partial fractions 362
8-6 Other special techniques of integration 368 Problems 372
J>
Linear Transformations and Matrices 378
91 Introductory ideas 378 9-2 Matrix algebra 386 9-3 Deter-
minants 396 9-4 Linear dependence and linear independence 404
9-5 Inverse and adjoint matrix 406 9-6 Matrix functions of a
single variable 410 9-7 Solution of systems of linear equations 413
9-8 Eigenvalues and eigenvectors 421 9-9 Linear transformations
424 9-10 Applications of matrices and linear transformations 426
Problems 432
1 Functions of a Complex Variable 444
10-1 Sequences of complex numbers and limits 444 10-2 Curves and
regions 448 10-3 Function of a complex variable, limits and con-
tinuity 452 10-4 Derivatives — Cauchy-Riemann equations 458 10-5
Conformal mapping 471 10 6 Applications of conformal mapping
482 Problems 485
CONTENTS / ix
11 Scalars, Vectors, and Fields 492
11-1 Curves in space 492 11-2 Antiderivatives and integrals of
vector functions 504 11-3 Some applications 509 11-4 Fields,
gradient, and directional derivative 575 11-5 An application to fluid
mechanics 520 Problems 522
1 2 Series, Taylor's Theorem and its Uses 531
12-1 Series 531 12-2 Power series 549 12-3 Taylor's theorem
557 12-4 Application of Taylor's theorem 571 12-5 Applications
of the generalized mean value theorem 577 Problems 586
13 Differential Equations and Geometry 596
13-1 Introductory ideas 596 13-2 Possible physical origin of some
equations 598 13-3 Arbitrary constants and initial conditions 601
13-4 Properties of solutions — isoclines 604 13-5 Orthogonal trajec-
tories 617 13-6 Modified Euler method 618 13-7 A simple pre-
dictor-corrector method 619 Problems* 623
14 First Order Differential Equations 626
14-1 Equations with separable variables 626 14-2 Homogeneous
equations 628 14-3 Exact equations 630 14-4 The linear equa-
tion of first order 634 14-5 Equations with implicit dependence on x
637 14-6 Clairaut's and Lagrange's equations 638 14-7 Picard's
iterative method 641 14-8 Direct deductions and comparison theorems
645 Problems 650
15 Higher Order Differential Equations 656
15-1 Linear equations with constant coefficients — homogeneous case 656
15-2 Linear equations with constant coefficients — inhomogeneous case 661
15-3 Variation of parameters 675 15-4 Simultaneous linear differen-
tial equations 677 15 5 Series solution of differential equations 678
15-6 Runge-Kutta method 680 15-7 Oscillatory solutions 683
15-8 Coupled oscillations and normal modes 686 15 9 The Laplace
transform 691 Problems 696
x / CONTENTS
16 Fourier Series 700
16-1 Introductory ideas 700 16 2 Convergence of Fourier series 770
16-3 Different forms of Fourier series 718 16-4 Differentiation and
Integration 726 Problems 731
Answers to selected problems 734
Index 756
Introduction to sets and
numbers
1 -1 Sets and algebra
In applications of mathematics to engineering and science, we often use the
properties of real numbers. Many of these properties are intuitively obvious,
but others are more subtle and depend for their proper use on a simple
understanding of the mathematical basis of the so-called real number system.
This chapter describes the elements of the real number system in a straight-
forward manner for subsequent use throughout the book.
The reader will certainly know how to work with finite combinations of
numbers, but what is less certain is whether he understands how to interpret
and use limiting processes. For example, what is the meaning and what, if
any, is the value to be associated with the limit
lim
m->-cc
h$
which is to be interpreted as the value approached by the expression in square
brackets as n increases without bound ?
It was questions such as these and, indeed, far simpler ones that first led
to the study of real numbers. Many properties of numbers, nowadays accepted
by all as self-evident, were once regarded as questionable. This is still clearly
apparent from much of the notation that is in current use.
Thus, for example, the fact that \/2 cannot be expressed as the ratio of
two integers led to its being termed an irrational number. Even more extreme
is the term imaginary number that is given to \/—\. Although, as we shall
see later, this number does not belong to the real number system and so
merits special consideration, it is however no less real than the integer 2.
Experience suggests that in any systematic development of the properties
of the real number system, the operations of addition and multiplication must
play a fundamental role. These conjectures are of course true, but underlying
the idea of real numbers and their algebraic manipulation are the even more
fundamental concepts of sets and their associated algebra. Because these
notions are sometimes unfamiliar, we shall start by considering some simple
but important ideas concerning sets.
We must first define the term set for which the alternative terms aggregate,
class, and collection are also often used. Our approach will be direct and
pragmatic and we shall agree that a set comprises a collection of objects or
elements, each of which is chosen for membership of the set because it
2 / INTRODUCTION TO SETS AND NUMBERS CH 1
possesses some required property. Membership of the set is determined en-
tirely by this property; an object only belongs to the set if it possesses the
required property, otherwise it does not belong to the set. The properties of
membership and non-membership of a set are mutually exclusive.
An important numerical-set which we shall often have occasion to use is
the set N of natural numbers 1, 2, 3, . . ., used in counting. In future the
symbol N will always be used to signify this natural set of positive integers.
Notice that there can be no greatest member m of this set, since however
large m may be, m + 1 is larger and yet is also a member of the set N.
Accordingly, when we use a number m that is allowed to increase without
restriction, it will be convenient to imply this by saying that 'w tends to
infinity', and to write the statement in the form m—>-co. Notice that infinity
is not a number in the usual sense, but just the outcome of the mathematical
process of allowing m to increase without bound. It is always necessary to
relate the symbol oo to some mathematical expression, since by itself it has
little or no meaning.
N is only one type of set however, and from the wording of our definition
it is apparent that the elements of a set need not be numerical. Thus in statistics
one is concerned with sets of events which may or may not be numerical,
whereas in the analysis of logical operations one is concerned with sets of
decisions. The notation and simple algebra we now develop are applicable to
all sets and, hence, to any situations such as those just enumerated which are
capable of description in terms of sets.
To simplify the manipulation of these ideas we must introduce a notation
for elements of a set, for sets themselves, and for the membership of an
element to a set. It is customary to denote general elements of sets by lower
case letters a, b, . . ., x, . . ., and sets themselves by capital letters A, B,
. . ., S, . . .. If a is a member of set A we shall write
a e A.
This is usually read 'a is an element of A\ Conversely, if a is not an element of
A we shall write
a$A.
In this notation we have 3 e N, but rr $ N, where 77 = 3-1415. . ., and N is
the set of natural numbers.
If a set only contains a small number of elements it is often simplest to
define it by enumerating the elements. Hence, for a set 5" comprising the four
integer elements 3, 4, 5, and 6 we would write 5 = {3, 4, 5, 6}. This set is a
finite set in the sense that it comprises a finite number of elements. Con-
versely, the set N of natural numbers is an infinite set since it contains an
infinite number of elements.
Often it is useful to have a notation which indicates the membership
criterion that is to be used for the set. Thus, if we were interested in the set B
SEC 1-1 SETS AND ALGEBRA / 3
of positive integers n whose squares lie between the positive numbers m and
2m, we would write
B = {n\n e N, in < n 2 < 2m}.
Here we have used the convention that the symbol « to the left of the vertical
rule signifies a general element of the set in question, whilst the expressions to
the right of the rule express the membership criteria for the set. There, of
course, the symbol < when used in conjunction with numbers a and b in
the form a < b is to be read 'a less than b\
An important set that is frequently used is the set of ordered pairs. An
element of this set will be written (m, n), where m and n are not necessarily
numbers and the element (m, n) is different from the element (n, m) unless
m and n are identical. An important use of this set is in the construction of
tables, when the ordered pair becomes an ordered number pair, the first
member of which is usually the argument and the second member the func-
tional value. Hence the ordered number pair {\tt, 0-5) could refer to the
sine of the angle \tt radians. In this example the relationship between the
first and second numbers of the ordered pair is determinate since sin ^77 = 0-5,
but this is not always the case with ordered pairs. Thus if the ordered pair of
integers (m, n) were used to describe the throw of a die in a series of N
trials, as the statistician would call them, then m could represent the number
of the throw or the trial number, and n the score resulting from that throw.
Here m would range from unity to N, the number of trials in the statistical
experiment, and n would be any integer between 1 and 6. There would then
be no rule by which n could be predicted for any given m.
Ordered number pairs are also encountered when constructing graphs of
functions where the convention is usually that (a, b) signifies the point with
x-coordinate a and j-coordinate b. Thus the graph of the function y == f(x)
for which x is between a and b could be written in set notation
S = {(x,f(x))\a <x<b}.
The notation of an ordered pair as an element of a set readily extends to
an ordered triple (m, n, r), which again need not necessarily involve numerical
quantities, nor need it be determinate. Again, two ordered triples will only be
identical if their corresponding entries are identical. Ordered number triples
of a determinate kind occur when considering the graph of a function of two
independent variables as, for example, the equilibrium temperature at a
given point of a cross-section of a very long metal bar.
Statistical events provide the most common source of ordered triples of
the indeterminate variety. As a simple illustration we may consider the
statistical experiment comprising N trials, each of which involves tossing a
coin twice and recording the results of each throw as a 'head' (H) or a 'tail'
(T). Then the first quantity in the ordered triple could record the trial number
with the second and third quantities recording an H or a T according as the
4 / INTRODUCTION TO SETS AND NUMBERS CH 1
first and second throws gave rise to a 'head' or a 'tail'. A typical ordered
triple would then be (3, T, H) in which the second and third entries in the
ordered triple cannot be predicted from a knowledge of the first entry.
It is often necessary to study relationships between sets and for this pur-
pose an algebra of sets must be constructed. The simplest situation that can
occur is that from a set A, a new set B is formed, such that all elements
of B are also elements of A. Such a set B will be called a subset of A. This
result will be written
B £ A,
which is to be read 'B is a subset of A\
If x is an element of A, so that we may write x e A, then either x e B,
or x $ B. When there are some elements x e A which are not to be found in
B, so that x' <£ B, then B is called a proper subset of A, the result being written
B c A.
The definition of a subset B of A does not preclude the possibility that
for every element x e A it is also true that x e B. When this occurs sets A
and B have the same elements and are said to be equal, the result being
written
A = B.
It is clear from the definition of equality that when A = B both the
statements A £ B and B £ A must be true. These last two statements are
often useful as an alternative definition of equality between sets.
With the above definitions it is clear that if A = N and B = {I, 2, 3, 4, 5},
then B c A; whereas if ^ = {4,7,3,5,9} and B = {7, 4, 5, 9, 3}, then
A £ B and B £ A so that ^ = B.
A more general situation arises when two sets A and B are involved, each
of which possesses elements which are not common to the other so that
neither statement A <= B, nor B <= A is true. The set of elements C that is
common to these two sets A and B will be called the intersection of the sets
A and B and is written
Sometimes this is read 'A cap B' with the understanding just defined.
In the event that there are no elements common to the sets A and B we
shall write
AnB = <f>,
with the understanding that <j> is the null set, which we define to be the set
containing no elements. Under these circumstances the sets A and B are said
to be disjoint.
By way of example, if A\ = {a, b, 1, 3, 5, 7} and Bi = {a, c, d, e, 3, 7, 9},
SEC 1-1
SETS AND ALGEBRA / 5
A^B
a n bi
A Ufii
(a.)
<b)
(c)
Fig. 1-1 Symbolic representation of set operations: (a) proper subset; (b)nnler-
section; (c) union.
then Ax n B x = {a, 3, 7}; whereas if A 2 = {1, 3, 7} and B 2 = {0, 4, 9, 11},
^2 n fi 2 = <£.
Another important set related to sets A and B is the set C containing all
the elements belonging to A, to B or to both A and B. This is called the union
of sets yf and B and is written
C = AUB;
which reads 'A cup B\ With the sets defined above we obviously have
AiV Bi = {a,b,c,d,e,l,3,5,7,9} and A 2 u 5 2 = {0, 1, 3, 4, 7, 9, 11}.
Clearly, for any set ^4 we have j> <^. A, Av) j> = A, and A <~^<f> = <f>.
These seemingly abstract ideas can be illustrated symbolically by means
of a very convenient device. This is the so called Venn diagram, which uses a
pictorial representation for the sets in question. Sets are represented by the
interior of closed curves, usually of arbitrary shape, and their relationship is
then illustrated by the relationships that exist between these curves. Thus,
when as in Fig. IT (a) curve A representing set A lies within curve B repre-
senting set B, we have the situation that A is a proper subset of B, so that
A <= B. Figs IT (b), (c) illustrate, respectively, the intersection A n B and
the union A u B of sets A and B, which are shown as shaded areas on those
figures.
■T5
***
a n b\
Fig. 1-2 Sets in plane: (a) intersection; (b) union.
A U B i
(b)
6 / INTRODUCTION TO SETS AND NUMBERS
CH 1
In general this representation is only symbolic, but in the event that
elements of the sets A and B may be unambiguously represented by points
in the plane, the Venn diagrams become true representations.
Let set A comprise all the points within and on a circle of unit radius,
usually called a unit circle, and centred on the origin, and let B comprise all
the points within and on the circle of radius 2 centred on the point x = 2-5
on the x-axis. Then the relationships A n B and A u B are truly represented
by the shaded areas in Figs 1-2 (a), (b).
Similarly, if we consider the sets A and B defined by the interiors and
I 2
A fl B = {1} A f) B=(/>
(a) (b)
Mg. 1-3 Intersection of sets in the plane: (a) single point contained in intersection ;
(b) disjoint sets.
boundaries of the two unit circles illustrated in Figs 1-3 (a), (b), we see that in
(a), A c\B = {1}, so that only the single point x = 1 on the x-axis is common
to A and B, whereas in (b), A n B = <f>.
A final idea we now introduce in connection with sets A and B is the
complement of B relative to A, which we shall write as A\B. This is a generali-
zation of the notion of subtraction and comprises the set of elements of A
that do not belong to B. The expression A\B is usually read 'A minus 5'
and if, for example, A = {a, 1, 3, 7} and B = {a, 7, 9, 11} then A\B = {1, 3}.
Appealing again to a Venn diagram, we illustrate this relationship by the
shaded region in Fig. 1-4.
A\B
Fig. 1-4 Symbolic representation of complement of B relative to A.
SEC 1-1 SETS AND ALGEBRA / 7
The following useful results are almost self-evident and are true for
arbitrary sets A, B, and C. They may be proved either from the basic defini-
tions, or by appeal to Venn diagrams.
Basic set operations
AuA = AnA = A, (1-1)
AnB = BnA, (1-2)
Akj B = Bu A, (1-3)
(AUB)UC = AU(BU C), (1-4)
(AnB)nC = An(BnC), (1-5)
Au(BnQ = (AvB)n(AuC), (1-6)
An(BvC) = (AnB)u(AnC). (1-7)
From these there follows an important theorem due to De Morgan:
theorem 1-1 For any three arbitrary sets A, B, and C it is true that
A\(B UC) = (A\B) n (A\C)
and
A\(B nQ = (A\B) u (A\C).
Proof An analytical proof of the first stated result involves the following
two steps: (a) the proof that if x is an arbitrary element such that
x e A\(B u Q, then x e (A\B) and x e (A\C), showing that
ipuQc (A\B) n (A\C);
and (b) the proof that if x e (A\B) and x e (A\C), then x e A\(B u C),
showing that
04\B) O (A\C) = ^\(B u C).
Then by our alternative definition of the equality of two sets P and Q,
whereby P = Q if i> c g and g c p, the result will follow. The details,
which are not difficult, are left to the reader. The proof of the second stated
result follows on similar lines.
The theorem may be illustrated in general terms, and proved for sets
which may be represented by points in a plane, by the use of Venn diagrams.
The three diagrams appropriate to the first stated result are shown in Figs
1-5 (a), (b), and (c), where the shaded regions represent the sets A\B, A\C,
and A\(B u C), respectively.
The reader will have noticed that it is a feature of basic set operations
8 / INTRODUCTION TO SETS AND NUMBERS CH 1
A\B A\C A\(B U Q
(a) (b) (c)
Fig. 1-5 Representation of De Morgan's theorem.
that they essentially combine two sets to generate a third in an unambiguous
manner. It is because of this simple property that operations such as union
and intersection are called binary set operations, the term 'binary' referring
to the two sets on which the set operation is performed to generate the third.
Thus the operation n acting on any two sets A and B generates a third set
C = A n B where, of course, C will be the null set if A and B have no common
elements.
Theorem 1-1 illustrates that operations on sets are not always as simple
as the formation of the union or intersection of sets. Accordingly, it is neces-
sary to appreciate clearly the implication of any statement that may be made
in the derivation of a result. These statements may either be 'one way' implica-
tions or 'two way' implications in the following sense. An implication will
be said to be one way if it is a simple statement of the form 'result A implies
result B\ This statement is usually written symbolically in the concise form
A=> B.
A two way implication arises if from the above statement it also follows
that 'result B implies result A\ so that in addition to the previous statement
it is also permissible to write
B=> A.
Rather than write for a two way implication the two results A => B and
B => A, the notation is contracted so that the two way implication may be
written concisely in the form
The symbol <*■ is usually read 'implies and is implied by'.
Two simple illustrations using sets of integers should clarify these remarks.
We can only write
a = 1 => a is an integer,
since the converse statement, a is an integer, does not imply that a = 1.
SEC 1-2 SET THEORY AND PROBABILITY / 9
However, we may obviously write
integer n contains a factor 2 «- n is an even integer.
Formal development of these and similar ideas is essential if the logical
structure of mathematicsMS to be fully appreciated, though these matters
will not be pursued further in this introductory account.
1 -2 Set theory and probability
One of the most direct applications of the elements of set theory is to be
found in a formal introduction to probability theory. Because the notion of
a probability is fundamental to many branches of engineering and science we
choose to introduce some basic ideas and definitions now, making full use
of the notions of set theory. This will serve a dual purpose in that it will
provide an excellent illustration of a specific application of set theory, whilst
at the same time introducing an important concept at the very outset of our
study.
In some situations the outcome of an experiment is not determinate, so
one of several possible events may occur. Following statistical practice we
shall refer to an individual event of this kind as the result or outcome of a
trial, whereas an agreed number of trials, say N, will be said to constitute an
experiment. If an experiment comprises throwing a die N times, then a trial
would involve throwing it once and the outcome of a trial would be the score
that was recorded as a result of the throw. The experiment would involve
recording the outcome of each of the N trials.
In general, if a trial has m outcomes we shall denote them by £i, E%, . . .,
E m and refer to each as a simple event. Hence a trial involving tossing a coin
would have only two simple events as outcomes: namely 'heads', which
could be labelled E\, and 'tails', which would then be labelled £2. In this
instance an experiment would be a record of the outcomes from a given
number of such trials. A typical record of an experiment involving tossing a
coin eight times would be £i, £2, E\, £i, £i, E 2 , £2, £1. With such a simple
experiment the £1, £2 notation has no apparent advantage over writing H in
place of £1 and T in place of £2 to obtain the equivalent record H, T, H, H,
H, T, T, H. The advantage of the £» notation accrues from the fact that the
subscript attached to the £ may be ordered numerically, thereby enabling
easier manipulation of the outcomes during analysis.
Events such as the result of tossing a coin or throwing a die are called
chance or random events, since they are indeterminate and are supposedly
the consequence of unbiased chance effects. Experience suggests that the
relative frequency of occurrence of each such event averaged over a series of
similar experiments tends to a definite value as the number of experiments
increases.
The relative frequency of occurrence of the simple event Ei in a series of
N trials is thus given by the expression
10/ INTRODUCTION TO SETS AND NUMBERS CH 1
Number of occurrences of event Ei
N
By virtue of its definition, this ratio must either be positive and less than unity,
or be zero. For any given N, this ratio provides an estimate of the theoretical
ratio that would have been obtained were N to have been made arbitrarily
large. This theoretical ratio will be called the probability of occurrence of
event Ei and will be written P(Et). In many simple situations its value may be
arrived at by making reasonable postulates concerning the mechanisms
involved in a trial. Thus when fairly tossing an unbiased coin it would be
reasonable to suppose that over a large number of trials the number of
'heads' would closely approximate the number of 'tails' so that P(H) = P(T)
= \. Here, of course, P(H) signifies the probability of occurrence of a 'head'
and P(T) signifies the probability of occurrence of a 'tail'.
If there are m outcomes E\, Ez, . . ., E m of a trial, and they occur with
the respective frequencies m, m, . . ., n m in a series of JV trials, then we have
the obvious identity
m + «2 + • • • + n m _
N ~
When N becomes arbitrarily large we may interpret each of the relative
frequency ratios mjN (i = 1, 2, . . ., m) occurring on the left-hand side as
the probability of occurrence P(Ei) of event Ei, thereby giving rise to the
general result
P(E{) + P(E 2 ) + ■ ■ ■ + P(E m ) = 1. (1-8)
By this time a careful reader will have noticed that the definition of
probability adopted here has a logical difficulty associated with it, namely,
the question whether a relative frequency ratio such as m/N can be said to
approach a definite number as N becomes arbitrarily large. We shall not
attempt to discuss this philosophical point more fully, but rather be content
that our simple definition in terms of the relative frequency ratio is in accord
with everyday experience.
An examination of Eqn (1-8) and its associated relative frequency ratios
is instructive. It shows the obvious results that:
(a) if event Ei never occurs, then m = and P(E t ) = 0;
(b) if event Ei is certain to occur, then ni = N and P{Ei) = 1 ;
(c) if event Ei occurs less frequently than event £), then rn < nj and
P{Ei)< P(.Ei);
(d) if the m possible events E\, £2, . . ., E m occur with equal frequency,
then m = « 2 = • • ' = n m = Njm and P(Ei) = P(E 2 ) = • ■ • = P(E m )
= \jm.
The relationship between sets and probability begins to emerge once it is
SEC 1-2
SET THEORY AND PROBABILITY / 11
appreciated that a trial having m different outcomes is simply a rule by which
an event may be classified unambiguously as belonging to one of m different
sets. Often a geometrical analogy may be used to advantage when representing
the different outcomes of a particular trial and such an approach then leads
directly to a representation closely approximating the Venn diagrams of the
previous section.
A convenient example is provided by the simple experiment which involves
throwing two dice and recording their individual scores. There will be in
all 36 possible outcomes which may be recorded as the ordered number
pairs (1, 1), (1, 2), (1, 3), . . ., (2, 1), (2, 2), . . ., (6, 5), (6, 6). Here the first
integer in the ordered number pair represents the score on die 1 and the second
the score on die 2. These may be plotted as 36 points with integer coordinates
as shown in Fig. 1-6 (a).
6
5
•3 4
a
o
u
o 3
o
tw
2
1
• • •
• • •:
• • •
• • •:
T3
C
O
£
o
u
2 3 4 5
Score on die 1
(a)
3 4 5
Score on die I
(b)
Fig. 1-6 Sample space for two dice: (a) complete sample space; (b) sample space
for specific outcome.
Because each of the indicated points in Fig. 1 -6 (a) lies in a two-
dimensional geometrical space (that is, they are specific points in a plane),
and in their totality they describe all possible outcomes, the representation is
usually called the sample space of events. The probability of occurrence of
an event characterized by a point in the sample space is, of course, the
probability of occurrence of the simple event it represents.
As a sample space will require a 'dimension' for each of its variables it
is immediately apparent that only in simple cases can it be represented
graphically. Nevertheless the idea is still useful, as was that of the Venn
diagram even when it was only symbolic.
The points in the sample space may be regarded as defining points in a
12 / INTRODUCTION TO SETS AND NUMBERS CH 1
set D so that specific requirements as to the outcome of a trial will define a
subset A of D, at each point of which the required event will occur. Typical
of this situation would be the case in which a simple event is the throw of two
dice, and the requirement defining the subset is that the combined score after
throwing the two dice equals or exceeds 8. Here the set D would be the 36
points within the square in Fig. 1-6 (b) and the set A the 15 points within
the triangle. Using set notation we may write A c D.
The sample space representation becomes particularly valuable when
trials are considered whose outcome depends on the combination of events
belonging to two different subsets A and B of the sample space. Thus, again
using our previous example and taking for A the points within the triangle in
Fig. 1-6 (b), the points in B might be determined by the requirement that the
combined score be divisible by the integer 3. The set of points B is then those
contained within the dotted curves of Fig. 1-6 (b).
A new set C may be derived from two sets A and B in two essentially
different ways according as :
(a) C contains points in A or B or both;
(b) C contains points in A and B.
If desired, these statements about sets may be rewritten as statements
about events. This is so because there is an unambiguous relationship between
an event and the set of points Sin the sample space at which that event occurs.
Thus, for example, we may paraphrase the first statement by saying, the event
corresponding to points in C denotes the occurrence of the events corresponding
to points in A or B, or both. Because of this relationship it is often convenient
to regard an event and the subset of points it defines in the sample space as
being synonymous.
The statements provide yet another connection with set theory, since in
(a) we may obviously write C = A u B, whereas in (b) we must write
C = A n B. In terms of the sets A and B defined in connection with Fig.
1-6 (b), the set C = A u B contains the points in the triangle together with
those within the two dotted curves exterior to the triangle. The set C = A n B
contains only the five points within the two dotted curves lying inside the
triangle.
Here it should be remarked that the statistician usually avoids the set
theory symbols u and n, preferring instead to denote the union of A and B
by A + B and their intersection by AB. This largely arises because of the
duality we have already mentioned that exists between an event and the set
of points it defines; the statistician naturally preferring to think in terms of
events rather than sets. However, to emphasize the connection with set
theory we shall preserve the set theory notation.
Using this duality we now denote by P(A) the probability that an event
corresponding to a point in the sample space lies within subset A, and define
its value to be as follows:
SEC 1-2 SETTHEORY AND PROBABILITY / 13
DEFINITION 11
P(A) is the sum of the probabilities associated with every point belonging
to the subset A.
In Fig. 1-6 (b) the set A contains the 15 points within the triangle and,
since for unbiased dice each point in the sample space is equally probable,
it follows at once that the probability 1/36 is to be associated with each of
these points. Hence from our definition we see that in this case, P(A)
= 15 x (1/36) = 5/12. Similarly, for the set B comprising the 12 points con-
tained within the dotted curves we have P(B) = 12 x (1/36) = 1/3.
We can now introduce the idea of a conditional probability through the
following definition.
definition 1-2
P(A\B) is the conditional probability that an event known to be associated
with set B is also associated with set A.
Clearly we are only interested in the relationship that exists between A and
B, with B now playing the part of a sample space. Because in Definition 1 -2
B plays the part of a sample space, but is itself only a subset of the complete
sample space, it is sometimes given the name of the reduced sample space.
In terms of set theory Definition 1-2 is easily seen to be equivalent to
P(A n B)
P(A\B) = -^p (1-9)
which immediately shows us how P(A\B) may be computed. Namely,
P(A \B)is obtained by dividing the sum of the probabilities at points belonging
to the intersection A n B of sets A and B by the sum of the probabilities at
points belonging to B. This ensures that P(B\B) = 1 as would be expected.
We can illustrate this by again appealing to the sets A and B defined in
connection with Fig. 1-6 (b). It has already been established that P(B) = 1/3,
and since there are only five points in A n B, each with a probability 1/36,
it follows that P(A n B) = 5/36. Hence P(A\B) = (5/36)/(l/3) = 5/12. This
result expressed in words states that when two dice are thrown and their
score is divisible by the integer 3. then the probability that it also equals or
exceeds 8 is 5/12.
A direct consequence of Eqn (1-9) is the so called probability multiplication
rule :
theorem 1-2 If two events define subsets A and B of a sample space, then
P(A n B) = P(B)P(A\B).
Sometimes, when it is given that the event corresponding to points in
subset B occurs, it is also true that P(A\B) depends only on A, so that
14 / INTRODUCTION TO SETS AND NUMBERS CH 1
P(A\B) = P(A). The events giving rise to subsets A and B will then be said
to be independent. The probability multiplication rule then simplifies in an
obvious manner which we express as follows:
Corollary 1 -2 If the events giving rise to subsets A and B of a sample space
are independent, then
P(A nB) = P(A)P(B).
Consideration of the interpretation of P(A u B) leads to another impor-
tant result known as the probability addition rule:
theorem 1-3 If two events define subsets A and B of a sample space, then
P(A u B) = P(A) + P(E) - P(A n E).
The proof of this theorem is self-evident once it is remarked that when
computing P(A) and P(B) from subsets A and B and then forming the expres-
sion P(A) + P{B), the sum of probabilities at points in the intersection
A n Bis counted twice. Hence P(A) + P(B) exceeds P(A u B) by an amount
P(A n B).
The probability addition rule also has an important special case when
sets A and B are disjoint so that A n B = <f>. When this occurs the events
corresponding to sets A and B are said to be mutually exclusive and we express
the result as follows :
Corollary 1-3 If the events giving rise to subsets A and B of a sample space
are mutually exclusive, then
P(A UB)= P(A) + P(B).
As a simple illustration of Theorem 1 -3 we again use the sets A and B
defined in connection with Fig. 1 -6 (b) to compute P(A u B). The result is
immediate for we have already obtained the results P(A) = 5/12, P(B) = 1/3,
and P(A n B) = 5/36, so from Theorem 1 -3 follows the result
P(A UB) = 5/12 + 1/3 - 5/36 = 11/18.
The applications of these theorems and their corollaries are well illustrated
by the following simple examples.
Example 1-1 A bag contains a very large number of red and black balls in
the ratio 1 red ball to 4 black. If 2 balls are drawn successively from the bag
at random, what is the probability of selecting
(a) 2 red balls,
(b) 2 black balls,
(c) 1 red and 1 blackball?
SEC .1-2 SET THEORY AND PROBABILITY / 15
Let A\ denote the selection of a red ball first (and either colour second),
and Az the selection of a red ball second (and either colour first). Then
Ai n A2 is the selection of 2 red balls and, similarly, Bi n B 2 is the selection
of 2 black balls. As the balls occur in the ratio 1 red : 4 black it follows that
their relative frequency ratios are 1/5 for a red ball and 4/5 for a black ball, so
P(Ai) = 1/5 and P(Bi) = 4/5.
The fact that the bag contains a large number of balls implies that the
drawing of one or more balls does not materially alter the relative frequency
ratio that existed at the start, so P{A Z ) = 1/5 and P{B^) = 4/5. This, together
with the fact that the balls are drawn at random, implies that the drawing of
each ball is an independent event. The independence of events A and B then
allows the use of Corollary 1 -2 to determine the required solutions to (a) and
(b). We find that
(a) P(Ai n A 2 ) = (1/5) . (1/5) = 1/25,
(b) P(Bi n Bi) = (4/5) . (4/5) = 16/25.
Now to answer (c) we notice that there are two mutually exclusive orders
in which a red and a black ball may be selected. Namely as the event Cu D
where C = A\ n B 2 (red then black) and D = B\C\ A* (black then red).
From Corollary 1-3 we then have that P(C u D) = P(C) + P(D), where
P(C) and P(D) are determined by Corollary 1-2. This shows that P(C)
= P(Ai)P(B 2 ) and P(D) = P{Bi)P(A 2 ), so that P(C) = P(D) = (1/5) . (4/5)
= 4/25. The solution to (c) becomes
P(C u D) = 4/25 + 4/25 = 8/25.
The three forms of selection (a), (b), and (c) are themselves mutually
exclusive and it must follow that P(Ai n A 2 ) + P{Bi n B 2 ) + P(C u D) = 1,
as is readily checked. Indeed this result could have been used directly to
calculate P{C u D) from P(A\ n A z ) and P(Bi n B 2 ) in place of the above
argument using Corollary 1-3.
The previous situation becomes slightly more complicated if only a
limited number of balls are contained in the bag.
Example 1-2 A bag contains 50 balls of which 10 are red and the remainder
black. If 2 balls are drawn successively from the bag at random, what is the
probability of selecting
(a) 2 red balls,
(b) 2 black balls,
(c) 1 red and 1 black ball ?
This time the approach must be slightly different because, unlike Example
1 ■ 1 , the removal of a ball from the bag now materially alters the probabilities
involved when the next ball is drawn. In fact this is a problem involving
conditional probabilities.
Here we shall define A to be the event that the first ball selected is red,
16 / INTRODUCTION TO SETS AND NUMBERS CH 1
and B to be the event that the second ball selected is red. The probability we
must now evaluate is the probability of occurrence of event B given that
event A has occurred. Expressed in set notation we have to find P(A n B),
the probability of occurrence of the event associated with A n B. This is a
conditional probability with the set associated with event A playing the role
of the reduced sample space. Utilizing this observation we now make use of
Theorem 1-2 to write
P(A nB) = P(A)P(B\A).
Now the relative frequency of occurrence of a red ball at the first draw is
10/(10 + 40) = 1/5, so that P(A) = 1/5. (Not till later will we use the fact
that the relative frequency of occurrence of a black ball is 40/(10 + 40)
= 4/5.)
Given that a red ball has been drawn, 9 red balls and 40 black balls remain
in the bag. If the next ball to be drawn is red then its probability of occurrence
is the conditional probability P(B\A) = 9/(9 + 40) = 9/49. Hence it follows
that the solution to (a) is
P(A nB) = (1/5) . (9/49) = 9/245.
It is interesting to compare this with the value 1/25 that was obtained in
Example 1 • 1 on the assumption that there was virtually an infinite number of
balls in the bag.
If C is defined to be the event that the first ball drawn is black and D the
event that the second ball drawn is black, then to answer (b) we must compute
P(C n D). Obviously, P(C) = 4/5, and by using an argument analogous to
that above it follows that P(D\C) = 39/(10 + 39) = 39/49. Hence the solu-
tion to question (b) is
P(C n Z>) = (4/5) . (39/49) = 156/245.
Again this should be compared with the value 16/25 obtained in Example
1-1.
The simplest way to answer (c) is to use the fact that events (a), (b), and
(c) describe the only possibilities and so are mutually exclusive. Hence the
sum of the three probabilities must equal unity. Denoting the probability of
event (c) by P we have
P=l-P(/lnB)-P(Cn D),
showing that P = 1 - 9/245 - 156/245 = 16/49.
It is sometimes helpful to bear in mind the following table in which
equivalent statements are expressed using the alternative languages of sets
and probability theory.
SEC 1-2 SETTHEORY AND PROBABILITY / 17
Sets Probability
Au B = C A + B = C; the event corresponding to C is denned as
the occurrence of at least one of the events corres-
ponding to A or B or both.
A n B = C AB = C; the event corresponding to C is defined as
the occurrence of both of the events corresponding to
A and B.
A n B = <f> AB = 0; events corresponding to A and B are mutually
exclusive.
A = <f> A = 0; the event corresponding to A does not occur.
B <= A B => ^4 ; the event corresponding to B implies that
corresponding to A.
A\B the event corresponding to A and not that corres-
ponding to B.
To close this section with a brief examination of repeated trials, the ideas
of a permutation and a combination must be utilized. The student will already
be familiar with these concepts from elementary combinatorial algebra and
so we shall only record two definitions.
definition 1-3 A permutation of a set of n mutually distinguishable
objects rata time is an arrangement, or an enumeration of the objects, in
which their order of appearance counts.
Thus of the five letters a, b, c, d, e the arrangements a, b, c and a, c, b
represent two different permutations of three of the five letters. These are
described as permutations of five letters taken three at a time. Other permuta-
tions of this kind may be obtained by further re-arrangement of the letters
a, b, c and by the replacement of any of them by either or both of the
remaining two letters d and e.
The total number of different permutations of n objects r at a time will be
denoted by n P r and it is left to the reader to prove as an exercise that
n Pr = -» (1-10)
(n — r)\ v '
where n\ (factorial n) = n(n — 1)(« — 2) . . .3.2.1, and we adopt the
convention that 0! = 1.
definition 1-4 A combination of a set of n mutually distinguishable
objects r at a time is a selection of r objects from the n without regard to
their order of arrangement.
18 / INTRODUCTION TO SETS AND NUMBERS CH 1
It follows from the definition of a permutation that a set of r objects may
be arranged in r! different ways so that denoting the number of different
combinations of n objects r at a time by I J, we must have
•*r«rt(" r ).
This gives the important result
l«\ = /' ■ (Ml)
\r) r\(n-r)\
In many books it will often be found that the expression n C r is written in
place of I I. The numbers I J are usually called binomial coefficients
because of their occurrence in the binomial expansion
(p + q)n — jr \\ p r q n ~ r , with n a positive integer. (112)
r = \ r J
Now consider an experiment involving a series of independent trials in
each of which only one of two events A or B may occur. Then if the prob-
abilities of occurrence of events A and B are p and q, respectively, we must
obviously have/7 + q = 1 . If n such trials constitutes an experiment, we might
wish to know with what probability the experiment may be expected to yield
r events of type A. The statistician will call such a situation repeated inde-
pendent trials.
An experiment will be deemed to be successful if r events of type A and
n — r events of type B occur, irrespective of their order of occurrence.
Clearly this can happen in I J different ways and by Corollary 1-2, since
the trials are independent, the probability of occurrence of any one of these
events will be/J r (l — p) n ~ r . Hence, as the results of trials are also mutually
exclusive, it follows from Corollary 1 -3 that the required probability P(r) of
occurrence of r events of A each with probability of occurrence p in n
independent trials is
JV) =(")/> r -/>)"-'• (113)
Identifying the p and q of Eqn (M2) with the probabilities of occurrence
of the events A and fijust discussed, we see that<7 = 1 — p, so that Eqn (112)
takes the form
i = i (") /,r(1 ~ / ' ) "" r - (H4)
Each term on the right-hand side of Eqn (M4) then represents the probability
SEC 1-2
SET THEORY AND PROBABILITY / 19
of occurrence of an event of the form just discussed. For example, the first
term
P(0)
-(;)<'-»■
is the probability that event A will never occur in a series of n independent
trials, whilst the third term
P(2) = fy p\\ - p)
2(\ _ „\n-2
is the probability that event A will occur exactly twice in a series of n indepen-
dent trials.
The n + 1 numbers P(r), r = 0, 1, . . ., n have, by definition, the property
that
P(0) +/>(!) + • • • +P(n)= 1,
(1-15)
and they are said to define a discrete probability distribution. It is conven-
tional to plot them in histogram fashion when they illustrate the probabilities
to be associated with the n + 1 possible outcomes of an experiment involving
n trials. Fig. 1-7 (a) illustrates the case in which n = 4 and p = \ so that
0-6
0-5
0-4
0-3
0-2
01
P{r)
( n = 4,p = i)
10
0-8
0-6
0-4
0-2
12 3
T>
U(r)
12 3
5
(a) (b)
Kig. 1.7 Binomial distribution: (a) binomial probability density function; (b)
binomial cumulative distribution function.
™-G)G)'er -&«■>-
27
, = — , and, similarly,
4/ 64
P(2) = 54/256, P(3) = 3/64, and ^(4) = 1/256. Because of the origin of this
distribution, Eqn (1-13) is said to define the binomial distribution. This
distribution is historically associated with Jacob Bernoulli (1654-1705) and
experiments of the type just examined are sometimes referred to as Bernoullian
trials. When the cumulative total
20 / INTRODUCTION TO SETS AND NUMBERS CH 1
U(r)=JP(t), (1-16)
«=o
is plotted in histogram fashion against r the result is called the cumulative
distribution function. The cumulative distribution function corresponding to
Fig. 1-7 (a) is shown in Fig. 1-7 (b). It is conventional to refer to the P(r)
as the probability density function or the frequency function since it describes
the proportion of observations appropriate to the value of r.
Example 1-3 If an unbiased coin is tossed six times, what is the probability
that only two 'heads' will occur in the sequence of results ?
As the coin is unbiased p = q = | and so
«-w-s
It is an immediate consequence of Eqn (1-13) that
(a) if A occurs with probability p in independent trials then the prob-
ability that it will occur at least r times in n trials is
I (")/>»0 -/>)»-«;
(b) if A occurs with probability p in independent trials then the probability
that it will occur at most r times in n trials is
iQp s (i-p) n - s ;
and to this we may add Eqn (1-13) in this form:
(c) if A occurs with probability p in independent trials then the prob-
ability that it will occur exactly r times in n trials is
("j^d -/')"-'•
Example 1-4 What is the probability of hitting a target when three shells
are fired, assuming each to have a probability \ of making a hit ?
Obviously here p = \ and we will have satisfied the conditions of the
question if at least one shell finds the target. Accordingly, using (a) above,
the result is
Hence the required probability is f + f + \ = \.
So far the sample spaces we have used have involved discrete points, and
SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 21
it is for this reason that the term discrete has been used in conjunction with
the definition of the binomial distribution. In other words, in discrete dis-
tributions, no meaning is to be attributed to points that are intermediate
between the discrete sample space points. In particular, referring to Fig.
1 -6 (a), there is no score to be attributed to a point with horizontal co-
ordinate 2-5 and vertical coordinate 4-2 any more than there is to the point
with horizontal coordinate 1 1 and vertical coordinate 9.
However, situations occur in which perfectly satisfactory sample spaces
can be defined which associate an event with every point of the sample space,
and not just certain discrete points. The definition of a distribution function
appropriate to this case requires ideas from the calculus and will not be
discussed here. In statistics such distributions are called continuous distri-
butions.
1 -3 Integers, rationals and arithmetic laws
The reader will already be familiar with the fact that if the arithmetic opera-
tion of addition is performed on the natural numbers, or the positive integers
as they are often called, the result will also be a positive integer. Written
symbolically this statement becomes a, b e N => (a + b) e N. However the
arithmetic operation of subtraction is less simple, since we know from direct
experience that even when a,beN, this does not necessarily imply that
a — b is a positive integer. Indeed, in general a — b may be equal to some
positive or negative integer or to zero.
Thus an attempt always to express the result of subtraction of natural
numbers in terms of the natural numbers themselves must fail. This is usually
expressed by saying that the system of natural numbers N is not closed with
respect to subtraction. The difficulty is of course resolved by supplementing
the set of natural numbers N by the set N* = {. . ., —3, —2, —1,0} of
negative integers and zero. If now in place of N we use the complete set of
integers I = N* u N already encountered in Problem 1-1, the assertions
a, b e I => {a + b) e I and a, b e I => (a — b) e I become unconditionally true.
The need to generalize the notion of the natural numbers N to the com-
plete set of integers I is thus seen to arise as a natural result of seeking a
number system in which the binary arithmetic operation inverse to addition
is always true; namely the operation of subtraction. However, the set of
numbers I is still far from adequate to enable everyday practical arithmetic
to be performed. To see this it is only necessary to comment that although the
product of two integers belonging to I itself lies in I, the quotient of two
integers belonging to I does not necessarily lie in I. Thus the complete set of
integers I is not closed with respect to division. Symbolically we can write
this as a, b e I => ab e I, but a, b e I => a/b e I only if b ^ and a = kb
with k e I. The symbol =£ used here is to be read 'not equal to' and the condi-
tion involving k simply ensures that the quotient ajb is integral.
Here again the operations of multiplication and division are inverse
22 / INTRODUCTION TO SETS AND NUMBERS CH 1
binary arithmetic operations. To remove the artificial restriction on division,
so that the quotient of any two non-zero integers becomes a number in some
number system, we must still further extend the system I of integers. This is
achieved by introducing the familiar system R* of rational numbers, which is
defined as the set of all numbers of the form alb, where b ^ and a, b e I.
Obviously, since integers are just a special case of rational numbers and, for
example, 2 is represented by any of the rationals 2/1, 4/2, 10/5, . . ., the set
R* also contains all the integers and so we may write I <= R*
>*
Numerous though the rational numbers obviously are, we now show how
they may be arranged in a definite order and counted. One way in which this
may be achieved is indicated in the following array which recognizes as
different all rational representations in which cancelling of common factors
has not been performed. Thus, for example, in this scheme 4/2, 6/3, 8/4, . . .,
are counted as different rational numbers, despite the fact that they all
represent 2. If desired these repetitions may be omitted from the resulting
sequence of rational numbers, though the matter is not important. The
counting or enumeration of the rationals proceeds in the order indicated by
the arrows :
\
-1
-1
-l i i
i l
T
~T^~
T" T"*2
3^4
t
\
t I
t I
-2
-2
-2 2 2
2 2
T
~2~
T"*~T*~2
3 4
t
\
t i
-3
-3
-3 3 3
3 3
~3~
~T~*
T^^Y
^3 4
If this form of enumeration is adopted then the first few rationals to be
specified are
0, 1, 1/2, 1, 2, -2, -1, -1/2, -1, -3/2, -3, . . ..
As already mentioned, if desired the repetitions may be deleted, so that the
start of the sequence would then become
0, 1, 1/2, 2, -2, -1, -1/2, -3/2, -3, . . ..
Clearly all rationals are included somewhere in this scheme, so that as
each one may be put into correspondence with an integer, the mathematician
SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 23
is entitled to say that the rationals are countable, despite the fact that they are
infinite in number. What this construction has established is the rather
remarkable result that the rationals are no more numerous than the set of
positive integers themselves.
It might, at first sight, seem that the rationals R* must contain all possible
numbers. In fact this is far from the truth since it is possible to show that
numbers exist which are not expressible as a rational fraction and yet which
lie between two rationals, however close they may be. For obvious reasons
they are called irrational numbers, and to substantiate our assertion we now
prove the existence of one such number.
We will show that \/2 is irrational or, to phrase the statement more
precisely, that there is no fraction of which the square is 2. The argument
starts from a given assumption and then produces a contradiction, thereby
showing that the original assumption must be false. It is called an argument
by contradiction and is a device frequently used in higher mathematics.
Suppose that mjn is such that m and n are integers having no common
factor and (mjn) 2 = 2. Then m 2 = 2n 2 so that m 2 must be even and hence
m itself is even. Because m is even we may set m = 2r, where r is some integer.
(Why ?) Then Ar 2 = 2n 2 , or 2r 2 = n 2 , which now shows that n 2 and hence n
must be even. The fact that n is even now allows us to set n = 2s and thus the
numbers m and n have a common factor 2, contradicting the initial assump-
tion. Hence the original assumption that \/2 is capable of representation in
the rational form mjn is false. We have thus proved that -y/2 is an irrational
number.
It is established in higher mathematics courses that the irrational numbers
are so much more numerous than the rationals that they cannot be enumerated.
We make no attempt to justify this claim here. Instead we refer the interested
reader to Problems 1-32 to 1-35 if he wishes to gain a little more insight into
the relationship that exists between the rationals and the irrationals. A final
important result arising from a deeper study of these matters, and to which
we make only a passing reference, is the fact that between them, the rational
and the irrational numbers exhaust all the possible types of numbers. In
effect this is saying that if we work with real rational and irrational numbers,
then there are no gaps left in the number system that can only be filled by the
introduction of yet another kind of number. This is important because it means
that however we may arrive at a number, as the result of a finite or an infinite
sequence of operations, it will either be a rational or an irrational number.
If the set R* of rational numbers is supplemented by the inclusion of the
irrational numbers, the resulting set R is called the real number system or,
the field of real numbers. The fact that R contains all possible types of real
numbers is expressed by saying that the set of real numbers R is complete.
Consequently, until we have occasion to consider entities such as y'—l
24 / INTRODUCTION TO SETS AND NUMBERS CH 1
there will be no need for us to work outside the real number system R.
Numbers called transcendental numbers form an important subset of [he
irrational numbers. These are numbers like e and it which are not defined as
the root of a polynomial with rational coefficients (cf. § 2-3).
For future reference it will be useful to summarize the basic properties
of the field of real numbers already known to the reader. We now do this
making full use of the mathematical shorthand so far introduced.
Additive properties
A-l a, b e R => (a + b) e R; R is closed with respect to addition.
A-2 a, b eR => a + b = b + a; addition is commutative.
A-3 a, b, c e R => (a + b) + c = a + (b + c) ; addition is associative.
A-4 For every aeR there exists a number e R such that + a = a ;
there is a zero element in R.
A- 5 If a e R then there exists a number — a e R such that — a + a — ;
each number has a negative.
Multiplicative properties
M-l a, b e R => ab e R; R is closed with respect to multiplication.
M-2 a,be'R=>ab = ba; multiplication is commutative.
M-3 a, b, c e R => (ab)c = a(bc); multiplication is associative.
M-4 There exists a number 1 e R such that 1 . a = a for all a e R;
there is a unit element in R.
M-5 Let a be a non-zero number in R, then there exists a number a -1 e R
such that a~ l a = 1 ; each non-zero number has an inverse. Usually
we shall write Ija in place of a -1 , so that the two expressions are
to be taken as being synonymous.
Distributive property
Dl a, b, c e R => a(b + c) = ab + ac; multiplication is distributive.
The above results are self-evident for real numbers and are usually called
the real number axioms. They are used by mathematicians as the logical basis
for our number system. Later we shall encounter other systems of objects
which, though sharing many of the properties of real numbers, are not them-
selves numbers. For future reference we mention matrices, for which M-l to
M-5 are not generally true, and vectors, for which two forms of multiplication
exist and for which M-5 has no meaning.
It is an immediate consequence of these axioms that commonplace
arithmetic operations may be performed without question. For example, it is
fundamental to arguments that a — b = Oo a = b, and a£ = arj => f = r\
if a ^ 0. These, and other elementary results of similar form, follow directly
as a result of simple applications of the axioms. As it would be out of place
to develop these ideas here we shall indicate the proof of just one such result,
stating the others in the form of Problem 1-37 which is left to be attempted by
SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 25
the reader who wishes to question further the basis of the real number system.
We prove that there is a unique zero in R. The argument is again by con-
tradiction, for we first suppose that two different zero elements exist and
denote them by and 0'. Then by A-4 it follows that + 0' = 0' and 0' +
= 0, whence by A-2 we must have = 0', thereby establishing the uniqueness
of the zero.
So far our list of properties of real numbers has been concerned only with
equalities. The valuable property of real numbers that they can be arranged
according to size, or ordered, has so far been overlooked. It is of course this
property that allows us to represent real numbers by points on a line and
thereby to construct graphs and other valuable geometrical representations.
Ordering is achieved by utilizing the concept 'greater than' which when used
in the form 'a greater than b\ is denoted by a > b. Hence to the other real
number axioms must be added:
Order properties
OT If a e R then exactly one of the following is true; either a > or
a = or —a > 0.
0-2 a, b e R, a > 0, b > => a + b > 0, and ab > 0.
We now define a > b and a < b, the latter being read 'a less than b\ by
a> b => a — b > and a < b => b — a > 0. The following results are
obvious consequences of the real number system and are called inequalities.
In places they also involve the symbol > which is to be read 'greater than or
equal to'.
Elementary inequalities in R
IT a > b and c>d=>a + c>b + d.
1-2 a > b > and c >d> => ac> bd.
1-3 k > and a> b => ka> kb.
1-4 a > b => — a < —b.
1-5 a < 0, b > => ab < 0; a < 0, b < => ab > 0.
1-6 a > => a- 1 > 0; a < 0j=> a 1 < 0.
1-7 a > b > => b' 1 > a- 1 > 0; a < b < => b' 1 < a' 1 < 0.
An important use of inequalities is in defining intervals on a line and
regions in a plane. Using the order property of numbers to associate numbers
with points on a line, an interval on a line may be considered to be a segment
of the line between two given points or numbers, a and b, say. Three cases
arise according as to whether (a) both end points are included in the interval,
26 / INTRODUCTION TO SETS AND NUMBERS
CH 1
a b
a < x < b
a < x < b
a < x <b
(a) (b) (c)
Fig. 1-8 Intervals on a line: (a) closed interval a < x < b; (b) open interval
a < x < b; (c) semi-open interval a < x < b.
(b) both end points are excluded from the interval, or (c) one is included and
one is excluded. These are called, respectively, (a) a closed interval, (b) an
open interval, (c) a semi-open interval. Namely, an interval is closed at an
end which contains the end point, otherwise it is open at that end. In terms of
the points a and b and the variable x representing an arbitrary point on the
line these are written :
(a) a < x < b ; closed interval ;
(b) a < x < b ; open interval ;
(c) a < x < b or a <i x < b; semi-open interval.
Thus 1 < x < 2 defines the semi-open interval containing the point x = 1
and the points up to, but not including, x — 2. These are represented in
Fig. 1-8 in which a solid line represents points in the interval, a circle represents
an excluded point, and a dot an included point.
Special cases occur when one or both of the end points of the interval
are at infinity. The intervals — oo < x < a and b < x < oo are called semi-
infinite intervals and — co < x < co is an unbounded interval or, more
simply, the complete real line.
We illustrate the corresponding definition of a region in the (x, j)-plane
by considering the three inequalities x 2 + y 2 < a 2 , y < x, x > 0. The first
defines the interior of a circle of radius a centred on the origin, the second
defines points below, but not on, the straight line y = x, and the third defines
points in the right half of the (x, j;)-plane including the points on the j-axis
(a) (b)
Fig. 1-9 Regions in plane: (a) region boundaries x 2 + y 1 = a 2 , y = x, and x = 0;
(b) region x 2 + y 2 < a 1 , y < x, x > 0.
SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 27
itself. These curves represent boundaries of the regions in question and the
boundary points are only to be included in the region when possible equality
is indicated by use of the signs > or <. The three regions are indicated in
Fig. 1-9 (a) in which a full line indicates that points on it are to be included, a
dotted line indicates that points on it are to be excluded, and shading indicates
the side of the line on which the region in question must lie. Fig. l-9(b)
indicates the region in which all the inequalities are satisfied.
Simple inequalities of the form (x. + l)(x + 3) > (x — l)(x — 2) also
define intervals. For, clearing the brackets, we 'see that x 2 + 4x + 3
> x 2 — 3x + 2 which, by simple application of the elementary inequalities
just listed, reduces to x > —1/7 defining a semi-infinite interval, open at the
end x = —1/7.
The elementary inequalities may often be used to advantage to simplify
complicated algebraic expressions by yielding helpful qualitative information
as the following example indicates.
Example 1-5 Prove that if ai, a 2 , . . ., a n and b\, b 2 , . . ., b n are positive
real numbers, then
mm
1 < r < n
(a r \ ^ ai + a 2 + • ■ • + a n la r \
— I < t : — < max — •
\br! 01 + 02 + • • • + b n l < r <n \b r J
Here the left-hand side of the inequality is to be interpreted as meaning
the minimum value of the expression (a r /br), with r assuming any of the
integral values between 1 and n and the right-hand side is to be similarly
interpreted reading maximum in place of minimum. The result follows by
noticing that
a\ + az + • • • + an _ 1
bx + 02 + ■ ■ • + b n ~^T7
2, t>r
® + -® + --- + md:
where ^,b r = bi + b 2 + ■ • • + b n . For if each of the expressions (ai/oi),
r=l
(a 2 jb 2 ), . . ., {dnjbn) is replaced by the smallest of these ratios, which could
be the value taken by all the expressions if ai = a 2 = ■ • • = a n > and
0i = 02 = • • • = b n > 0, then
ai + a-i+ ■ ■ ■ + a n
0i + 02 + • • • +
» . (a r \
~- mm 77
n < r < n \0r/
= min (£),
< r < n \Or/
"(01 + 02 + • • • + b n )'
I b r
r=l
which is the left half of the inequality. The right half follows by identical
reasoning if maximum is written in place of minimum.
28 / INTRODUCTION TO SETS AND NUMBERS CH 1
1 -4 Absolute value of a real number
definition 1-5 The absolute value \a\ of the real number a provides a
measure of its size without regard to sign, and is defined as follows:
. . [a when a >
\o\ = {
[—a when a < 0.
Thus if a = 3, then \a\ = 3 and if a = — 56 then \a\ = 5-6.
There are three immediate consequences of this definition which we now
enumerate as
theorem 1-4 If a, A e R then
(a) \ab\ = \a\ \b\,
(b) \a + b\<\a\ + \b\,
(c) \a - b\ > \\a\ — \b\\.
The proof is simply a matter of enumerating the possible combinations
of positive and negative a and b, and then making a direct application of the
definition of the absolute value. We shall only illustrate the proof of (a).
There are three cases to be considered ; firstly a > 0, b > 0, secondly
a > 0, b < 0, and thirdly a < 0, b > 0. If a > 0, b > then ab > and so
\ab\ = ab = \a\ \b\. The second and third situations are essentially similar
so we shall discuss only the second. As a > 0, b < we have ab < 0, whence
\ab\ — —ab = a(—b) = \a\ \b\, establishing (a). For reasons we give later,
result (b) is usually called the triangle inequality.
The absolute value may also be used to define intervals since an expression
of the form \a — x\ > 2 implies two inequalities according as a — x is
positive or negative. If a — x > then \a — x\ = a — x and we have
a — x>2or x<a — 2. However if a — x < 0, then by the definition of the
absolute value of a — x we must have \a — x\ = —(a — x) showing that
— (a — x) > 2, or, x > a + 2. Taken together the results require that x
may be equal to or greater than 2 + a or equal to or less than a — 2. x may
not lie in the intervening interval of length 4 between x = a — 2 and x = a + 2.
This is illustrated in Fig. 1-10 (a) where a solid line is again used to indicate
points in the interval satisfied by \a — x\ > 2 and the dots are to be included
in the appropriate intervals.
By exactly similar reasoning we see that if we consider the inequality
1 < |* + 1| < 2, then if x + 1 > 0, |.v + 1| = x + 1 and the inequality
becomes 1 < x + 1 < 2. Hence the interval is < jc< 1. However, if
x + 1 < 0, then \x + 1| = — x — 1 and so the inequality becomes
1 < — jc — 1 < 2 giving rise to the interval — 3 < x < —2. These intervals
are shown in Fig. 1-10 (b) with circles indicating points excluded from the
end of the solid line intervals and dots indicating points to be included.
SEC 1-5 REPRESENTATION OF NUMBERS / 29
a+2 0-2 -3 -2
\a-x\>2 l<|;t+l|<2
(a) (b)
Fig. 110 Intervals on a line: (a) \a — x] > 2; (b) I < \x + l| < 2.
1-5 Representation of numbers
The decimal representation of real numbers is usual in all ordinary arithmetic
work and involves expressing a real number as the sum of an integral part
and a decimal fraction. Each of the parts is represented as the sum of multiples
of powers of 10, with the powers being positive integers or zero when repre-
senting the integral part and negative integers when representing the decimal
part. The number 10 that forms the basis of the decimal system is called the
base of the number system.
The integral part r of a finite real number a is thus expressible as
r = fl„(10«) + tfn-iOO"- 1 ) + • • • + aiOO 1 ) + a (10°),
where n is suitably chosen, and the coefficients a% are either zero or an integer
between 1 and 9. Hence, in reality, the number 2049 is a convenient representa-
tion of 2(10 3 ) + 0(10 2 ) + 4(10!) + 9(10°), with the positions of the digits
indicating the positive powers of 10 by which they are to be multiplied before
addition.
Similarly, if the decimal fraction part d of a real number a terminates
after n decimal places, then it is expressible in the form
b\ b%
b n
-„ + ■ ■
' ' + 77T
10 10 2
10"
with the coefficients b] again being either zero or an integer between 1 and 9.
Hence the decimal number 0-3012 is, in reality, the representation of
3 JL _L —
10 + "HP + UP + 10 4 '
with the positions of the digits indicating the negative powers of 10 by
which they are to be multiplied before addition.
In general then, the decimal number that is written
a m am-i ■ ■ ■ fli«o b\b2 . . ■ b n
(m + 1) digits n digits
and which terminates after n decimal places, is the representation of
M10 m ) + flm-iOO™- 1 ) + • • • + aiClO 1 )
30 / INTRODUCTION TO SETS AND NUMBERS CH 1
Consideration of the representation of non-terminating decimal fractions
and irrational numbers will be postponed until we discuss sequences and
limits, since the approximation of real numbers by rationals has not yet been
discussed.
There is no reason why the base of the number system should not be any
integer N > 1 and, indeed, in digital computing extensive use is made of the
binary system. This is the system of representation using the base 2. Hence a
binary number will contain only the digits 1 and with their position indi-
cating the power of 2 involved. Thus we may write
11 = 1(23) + 0(22) + i( 2 i) + i(20)
so that the binary representation of 1 1 is 101 1. Similarly, the rational number
9/16 may be written
9 _ 1 1
Yi>~2 + 2 2 + 2 z + 2 i '
showing that its binary fraction form is 0-1001. Hence the binary form of the
number live becomes 101 11001 and, as in the case of decimals, the position of
a digit relative to the binary point indicates the power of two by which it is
to be multiplied before addition.
It is easily verified that the addition and multiplication tables for binary
numbers are as illustrated in the following two tables:
Binary Binary
addition multiplication
+
1
1
1
1
X
1
1
1
Both tables are entered by selecting one digit in the first column and one
in the first row, when the result of the operation appropriate to the table,
namely addition or multiplication, is shown in the body of the table. For
example, using the addition table and taking the digits 1 in the first column
and in the first row we see that 1+0=1. Similarly, taking the digits
1 in the first column and 1 in the first row we see that 1 + 1=0. The inter-
pretation of this latter result is, of course, that a digit 1 must be transferred to
the next higher power of 2, corresponding to the transference of multiples of
powers of 10 when performing ordinary addition. The multiplication table is
straightforward and needs no further comment.
The examples that now follow illustrate the addition, multiplication, and
SEC 1-6 MATHEMATICAL INDUCTION / 31
subtraction of simple binary numbers. We shall let a = 12, b = 11 and form
a + b, ab, and a — b using binary notation. The binary representations of a
and b are a = 1 100, b = 1011 and so we have:
Addition
110
10 11
+
Multiplication
1 1 x
10 11
li 1 1 1
1 10
110
110
10 10
Here the subscript 1 has been used to indicate the transference of a digit 1
corresponding to the result 1 + 1=0.
The subtraction a — b is equally straightforward provided it is recalled
that when the subtraction of digits — 1 is encountered, it is necessary to
'borrow' a digit 1 from the next higher position in the number b. Thus the
result would be to write 1 in place of — 1 and to add 1 to the next higher
position in b.
Subtraction
110 0-
10 11
1
The expressions a + b, ab, and a — b for a = 12, b = 11 are thus
a + b = 1(24) + o(23) + 1(2 2) + i( 2 i) + 1(2 o) = 23,
ab = 1(26) + o(25) + o(24) + o( 2 3) + 1(2 2) + (2i) + 0(2°) = 132,
a - b = 0(23) + 0(22) + ( 2 i) + 1(20) = L
1 -6 Mathematical induction
Mathematical propositions often involve some fixed integer n, say, in a special
role and it is desirable to infer the form taken by the proposition for arbitrary
integral n from the form taken by it for the specific value n = m. The logical
method by which the proof of the general proposition, if true, may be estab-
lished, is based on the properties of natural numbers and is called mathematical
induction.
In brief, it depends for its success on the obvious fact that if A is some set
32 / INTRODUCTION TO SETS AND NUMBERS CH 1
of natural numbers and 1 e A, then the statement that whenever integer
ne A, so also does its successor, implies that A = N, the set of natural
numbers.
The formal statement of the process of mathematical induction is expressed
by the following theorem where, for simplicity, the mathematical proposition
corresponding to integer n is denoted by S(n).
theorem 1-5 (mathematical induction) If it can be shown that,
(a) when n = m, the proposition S(m) is true,
and
(b) if for n > n x , when S(n) is true then so also is S(n + 1),
then the proposition S(n) is true for all natural numbers n > n x .
A simple illustrative example will help here and we now prove inductively
n
that the sum 2 r of the first n natural numbers is given by n(l + n)/2. In
r = l
other words, in this example the proposition denoted by S(n) is that the
following result is true :
1 + 2 + • ■ • + n = «(1 + «)/2.
Proof, step (a) First the proposition must be shown to be true for some
specific value n = m. Any integral value m will suffice but if we set m = 1
the proposition corresponding to S(l) is immediately obvious. If, instead, we
had chosen m = 3, then it is easily verified that proposition S(3) is true,
namely that 1 + 2 + 3 = 3(1 + 3)/2.
Proof, step (b) We must now assume that proposition S(n) is true and
attempt to show that this implies that the proposition S(n + 1) is true. If
S(n) is true then
1+ 2 +•••+« = »(1 + «)/2
and, adding (n + 1) to both sides, we obtain
1 + 2 + • • • + n + (n + 1) = «(1 + ii)/2 + (n + 1)
= (n + 1X2. + «)/2.
However, this is simply a statement of proposition S(n + 1) obtained by
replacing n by n + 1 in proposition S(n). Hence S(l) is true and S(n)
=> S(n + 1) so, by the conditions of Theorem 1-5, we have established that
S(n) is valid for all n.
Later we shall use this form of proof in cases less trivial than the above
example which simply involved establishing the sum of an arithmetic progres-
sion.
SEC 1-6 MATHEMATICAL INDUCTION / 33
As another illustration of an inductive argument we now consider the
determination of the nth term in the sequence of numbers «o, "1, u->, ■ . .,
defined sequentially by the equation
m„ = 2m„-i+1. (1-17)
Equations of this form which define a sequence of discrete numbers u n
are called first-order difference equations. Tt is clear that this difference
equation provides us with the algebraic rule by which the wth term of the
sequence may be computed once the first term «o has been specified. Generally
speaking, any rule which specifies the form of computation to be pursued in
order to arrive at the solution of a given problem is called an algorithm.
A few moments' experiment will suffice to convince the reader that the
solution to Eqn (1-17) may be expressed in terms of wo by the equation
i/„ = 2» Mo + (2»- 1). (1-18)
The initial term «o of the sequence is arbitrary and on account of this fact
such a solution is called a general solution of the first-order difference equation
(IT 7). Once uo is specified by requiring that uq = C, say, then the solution is
said to be a particular solution.
The proof of Eqn (1T8) by induction again proceeds in two parts, with
the proposition S(n) being that Eqn (IT 8) is the solution of Eqn (IT 7).
Proof, step (a) If m = 1, then wi = 2« + (2 — 1) = 2t/o + 1, showing that
the proposition S(l) is true.
Proof, step (b) Assuming the proposition S(n) is true, then
2m» + 1 = 2[2» Mo + (2» - 1)] + 1
= 2» +1 w + (2' H1 - 1)
= Un+U
showing that S(n) => S(n + 1). The result is thus true for all n.
To conclude this section, having introduced the notion of a difference
equation let us take the concept a little further so that it can be used in more
general circumstances. A homogeneous linear difference equation of order 2
is a relationship of the form
II n + aUn-l + bUn-2 = 0, (H9)
where a and b are real constants and « B -2, Un-u u n are three consecutive
members of a sequence of numbers. Given any two consecutive members in
the sequence, say m and «i, then Eqn (1T9) provides an algorithm by which
any other member of the sequence may be computed.
Tf we seek a solution u n of the form
u n = AV", (1-20)
34 / INTRODUCTION TO SETS AND NUMBERS CH 1
where A and X are real constants, then substitution into Eqn (1-19) shows that
X 2 + aX + b = 0. (1-21)
This is called the characteristic equation associated with the difference
equation (1-19) and shows that solutions of the form of Eqn (120) are only
possible when X is equal to one of the two roots X\ and Xz of Eqn (1-21),
which we assume to be real numbers. If X\ -j= 1%, then AX\ n and Bfa" are both
solutions of Eqn (1-19) and it is easy to show that
u n = AXi n + BX 2 n (1-22)
is also a solution, where A and B are arbitrary real constants. This result is
the general solution of Eqn (1T9). Given specific values for u and «i,
A and B can be deduced by substituting into Eqn (1 -22) and hence a particular
solution found.
Suppose, for example, that the difference equation was
U n — U n -1 — Un-2 = 0,
and that «o = "i = 1. Then the characteristic equation is
X2 - X - 1 = 0,
with the two roots Xi == (1 + V 5 )/ 2 and ^ = (1 — V5)/2. Hence the general
solution has the form
u „ A ^y + B (L=^iy. ,,.23)
To deduce the values of A and B particular to our problem we use the
initial conditions uo = 1 and wi = 1 to deduce from Eqn (1-23) that
1 = A + B (case n = 0, «o = 1)
1= ^lJ_V^ +5 (i_^ (case « =1,^=1)-
Solving these equations for A and B we find
V5 + 1 V5 - 1
2V5 2^5
whence the particular solution is
The first few numbers wo, "i, W2, . . ., of the sequence generated by this
algorithm are
1, 1,2,3,5,8, 13,21,34,55,. . .,
PROBLEMS / 35
and comprise the well-known Fibonacci sequence of numbers. This sequence
of numbers occurs naturally in the study of regular solids and in numerous
other parts of mathematics. Naturally if only the first few members of the
sequence are required then they are most easily found by use of the algorithm
itself, which in the form
Un = Un-1 + Un-2
states that each member of the sequence is the sum of its two predecessors.
It is not difficult to see that if the roots of the characteristic equation
( 1 -2 1 ) are equal so that X x = X 2 = n, say, then Aju n is a solution of Eqn (1-19).
In terms ofEqn(lT9) this is equivalent to saying that a 2 = 4b and fi = —a/2.
However A/u n cannot be the general solution since it only involves one
arbitrary constant A, and it is necessary to have two such constants in the
general solution to allow the specification of the initial conditions u and m.
The difficulty is easily resolved once we notice that nB/u n , with B an arbitrary
real constant, is also a solution of Eqn (1T9). This is easily verified by direct
substitution. For then we have for the general solution in the case of equal
roots in the characteristic equation,
u n = (A + nB)p». (1-24)
To illustrate this situation, suppose that we are required to solve the
difference equation
U n = 6h„-i — 9u n - 2
subject to the initial conditions wo = 1, «i = 2. Then the characteristic equa-
tion becomes
A 2 - 6A + 9 = 0,
with the double root I — 3 . From Eqn ( 1 -24) the general solution must thus be
u n = (A + nB) . 3 n .
Using the initial conditions u = 1, wi = 3, then, shows that
1 = A and 2 = 3(A + B),
so that the particular solution to the problem in question is
u n = (1 - in)3 n .
PROBLEMS
Section 11
1-1 Enumerate the elements in the following sets in which I signifies the set of
natural positive and negative integers including zero:
(a) S= {n | we I, 5 < « 2 < 47};
(b) S = {n 3 | n e N, 15 < rfi < 40! ;
36 / INTRODUCTION TO SETS AND NUMBERS CH 1
(c) S = {(m, n) | m, n e I, 12 < m 2 + n 2 < 18);
(d) S = {(///, n, m + n) \ m, n e N, 45 < m 2 - + « 2 , 3 < w + n < 9};
(e)S = {x|xEN, x 2 + Olx - 11 = 0}.
1-2 Express the following sets in the notation of the previous question:
(a) the set of positive integers whose cubes lie between 7 and 126;
(b) the set of integers which are the squares of the integers lying between M
and N(0< N < M);
(c) the points in the plane that lie between circles of radii 1 and 3 drawn about
the origin and which have x-coordinates greater than 0-5.
1-3 Give an example of
(a) a finite set having numerical elements,
(b) a finite set having non-numerical elements,
and in each case give an example of a proper subset.
1-4 Give an example of
(a) a set of ordered triples involving numerical quantities,
(b) a set of ordered triples involving non-numerical quantities,
and in each case give an example of a proper subset.
1-5 State the relationships between the sets A and B if:
(a) A = N, B = {2«|/zeN};
(b) A = {sin x | x = (1 + 12/7)^77, «eN}, B= {£};
(c) A = {1,2,3,4], B= {5,7,9, 11}.
1-6 Form the union, intersection, and the complement of B relative to A of the
sets A and B if:
(a) A = N, B= {2n\ weN};
(b) A = {a, b, c, 0, 2, 4}, B = {d, e, f, 1, 3, 6, 7};
(c) A = {1, v'2, 2, 3, v "5, 6}, B = {0, y 2, v '5}.
1-7 Construct Venn diagrams for the union and intersection of the sets A and B if:
(a) A is the set of points interior to the unit square (that is, square having
side of unit length) with one corner at the origin and lying entirely in the
first quadrant, and B is the set of points exterior to the unit circle centred
on the origin;
(b) A is the set of points interior to the isosceles triangle of unit side with its
centre of gravity at the origin and a side parallel to the x-axis, and B is
the unit square having its centre at the origin and a side parallel to the
x-axis.
1-8 Represent by points on a graph the 36 possible outcomes of throwing two
dice, each with faces numbered 1 to 6. Identify the set of points at which the
sum of the scores on the two dice is greater than or equal to 7.
1-9 By using Venn diagrams, prove Eqns (1-6) and (1-7) for sets which may be
represented by points in the plane.
110 Complete the details of the analytical proof of the first stated result of
Theorem 11.
111 Illustrate by means of a Venn diagram the result.^\(B r\ C) = (A\B) u (A\C)
of Theorem 11.
1-12 The expression (A\B) u (B\A) is called the symmetric difference of sets A and
B. Illustrate the result by means of a Venn diagram and show that
PROBLEMS / 37
(A\B) u (B\A) = {A u B)\(A n B).
113 Prove analytically that An B = A\(A\B) and illustrate your result by means
of a Venn diagram.
1-14 In the following expressions, replace the symbol * by <=, by => or by <?> to
make them valid logical statements concerning the sets A, B and an element x:
(a) xe A * xe A^J B;
(b) x e B * xeAvB;
(c) xe A * xe A n B;
(d) xe A or xe B or x e A r\ B * xe Au B;
(e) xe A or xe B, x$ A n B * xe(Av B)\(A n B).
Give one example each of the use of => and o.
1-15 If * is a set operation and it is true that (A * B) * C = A * (B * C), then the
operation * is said to be associative. Use a Venn diagram to prove that
(a) (/(uB)uC=^u(5uQoiuBuC;
(b) (A n B) n C = A n(B n Q & A n B n C.
Section 1-2
1-16 Toss a coin 50 times and plot the relative frequency of 'heads'.
1-17 Suggest a graphical representation for the sample space in which the outcome
of tossing three coins might be recorded.
1-18 Suggest a graphical representation for the sample space characterizing the
score recorded in a trial involving the tossing of a die together with a coin
which has faces numbered 1 and 2. Give examples of:
(a) two disjoint subsets of the sample space;
(b) two intersecting subsets of the sample space, indicating the points in their
intersection.
1-19 By using Eqn (1-9) explain why
P{A nB) = P(B) P(A | B) = P(A) P(B | A).
Verify your result by computing P(A), P(B), P(A | B), P{B \ A), and P(A n B)
using the sets defined in connection with Fig. 1-6 (b).
1-20 Use a Venn diagram to prove the generalized probability addition rule
P(A u B u C) = P(A) + P(B) + P(C) - P(A n B)
-P(A nQ- P(B nC) + P(AnBn C).
1-21 Use Theorem 1-2 to prove the generalized probability multiplication rule
P(A n B n C) = P(A) P(B \A)P{C\Ar\ B).
1-22 Complete the argument in Example 1-2 (a).
1-23 A bag contains 30 balls of which 5 are red and the remainder are black. A
trial comprises drawing a ball from the bag at random, recording the result
and then replacing the ball and shaking the bag. This process is called sampling
with replacement. If this process is repeated twice, what is the probability of
selecting
(a) 2 red balls;
(b) 2 black balls;
(c) 1 red and 1 black ball?
38 / INTRODUCTION TO SETS AND NUMBERS CH 1
1-24 By considering arrangements of the five letters A, B, C, D, E verify that
b Pi = 20 and Q) = 10.
1-25 How many blends of coffee comprising equal quantities of 4 different types of
coffee bean are possible if 9 different types of coffee bean are available.
1-26 A game involves a team of 5 persons who play sequentially. How many
different teams may be drawn up if 10 players are available.
1-27 On the assumption that a participant in a raffle will buy either 2 or 4 numbered
tickets, how many different sets of tickets may he choose from a book of
20 tickets.
1-28 A coin is biased so that the probability of 'heads' is 0-52. What is the prob-
ability that:
(a) 3 heads will occur in 6 throws;
(b) 3 or more heads will occur in 6 throws?
1-29 Shells fired from a gun have a probability \ of hitting the target. What is the
probability of missing the target if 4 shells are fired?
1-30 Draw the probability density function for the binomial distribution in which
p = J and n = 6. Use your result to draw the corresponding cumulative
distribution function.
1-31 By considering Fig. 1-6 (a) deduce and draw the probability density function
describing the sum of the scores on the two dice.
Section 1-3
1-32 Describe two different ways of defining N rational numbers between 1 and 2.
Generalize one of these methods to interpolate N rationals between any two
rationals a and b.
1-33 Working from the array of rational numbers given in Section 1-3, use arrows
to suggest two alternative schemes to the one already described by which all
the rational numbers may be enumerated. Is this array the only possible one
that may be used ? If not, give an alternative.
1-34 Use the fact that y'2 is irrational to prove that if a is a rational number, then
a + v'2, ay 2 and V2/a are also irrational. Would the results still be true if
\ 2 were replaced by any other irrational number, and would your proof
still suffice?
1-35 Prove that \/3 is irrational. (Hint: first assume that \/3 is rational and equal
to p/q, and then obtain a contradiction by considering even and odd values of
q separately.)
1-36 The operation of division is defined in terms of multiplication as indicated in
the following problem. The reader is required to provide the justification for
some familiar arithmetic operations using only the operation of multiplication
and the definition provided. Given that a and b are real numbers and that
b /- 0, we define ajb by k = ajb if, and only if, kb = a. Does this define alb
uniquely? Why is it necessary that 6^0? Show that ajb = cajcb whenever
c / and that ajb + c/d = (ad + bc)/bd; (a/b^c/d) = ac/bd; l/Ca/b) = bja
(a =£ 0).
PROBLEMS / 39
1-37 Prove the following statements concerning real numbers by directly applying
the real number axioms:
(a) There is just one zero element and one unit element;
(b) a + £ = a + // => £ = 1/ ;
(c) O.a = a.O = 0;
(d) ai = a>i and a + => £ = >/;
(e) (-a)A = a(-Z>) = -(a/>); (-a)(-ft) = ab;
(f) a ft = => a = or ft = 0;
(g) 0(6 — r) = ab — ac.
1-38 The expression {a r }" 1 denotes the sequence of numbers ai, 02, . . ., a„.
Given that {a r )l i = 0-2, 3, 1-8, 2-2, 1, 3, 2 and {MJ-i = 0-3, 2, 1-8, 1-1, 2,
4, 1 verify the inequality of Example 1-5.
1-39 Prove that if a > b > and k > then
b b + k , a + k a
- < < 1 < < —
a a + k ft + k b
1-40 Indicate by means of a diagram the intervals defined by the following expres-
sions, using a dot to signify an end point belonging to an interval and a circle
to indicate an end point excluded from the interval :
(a) (x + 2)(x +3)<(x- \){x - 2);
(b) < I x - 3 I < 1 ;
(c) |*| < 2;
(d) < I 2x + 1 I < 1 ;
(e) I 3jc + 1 I >2;
(f) £±J> < x
2^+2 2(x - 1)
1-41 Identify the regions in the (x, vO-p'ane determined by the following inequalities.
Mark a boundary that belongs to the region by a full line; a boundary that
does not by a dotted line; an end point that is included in an interval by a dot;
an end point excluded from an interval by a circle:
(a) x 2 + y 2 < 1 ; x < 0; y < —x;
(b) y < sin x; x 2 + v 2 > "- 2 ; y < I ;
(c) ix 2 + v 2 > l; \y\<i;
(d) y >x 2 ; \x - 1 | < 1; y <A.
1-42 Give numerical examples to illustrate Theorem 1-4.
1-43 Prove Theorem 14(b) by considering separately the cases a 0, b 0;
a < 0, b < 0; a > 0, ft < 0; a < 0, b > 0.
1-44 Express these numbers in binary notation:
(a) 27; (b) lyV; (c) 2-Jf ; (d) i»,
1-45 Express the following numbers in binary notation, and then use your results
to form the expressions a + ft, a — ft, and ab. Check by interpreting the results
in terms of the base 10:
(a) a = 12, ft = 11 ; (Jo) a = 3iV, * = J; (O a = j», ft = ,'„.
1-46 Give numerical examples to illustrate Theorem 1-4 using binary notation.
1-47 Using the number system to base 3 and the digits 0, 1, 2 represent these
numbers:
40 / INTRODUCTION TO SETS AND NUMBERS
CH 1
(a) 27; (b) 2|; (c) 2J,; (d) 11
1-48 Using the number system to base 3 and the digits 0, 1, 2 write out the addition
and multiplication table for three digits analogous to those of Section 1-5.
1-49 Express the following numbers in terms of the base 3 and use the tables of the
previous problem to evaluate a + b, a — b, and ab. Check by re-interpreting
your results in terms of the base 10.
(a) a = 4, b = 2J; (b) a = 3, b = £; (c) a = •£, b = - 2 V-
1-50 Give an inductive proof that
(a) ]£ (a + «0 = - [2a + (n
r = L
Dd];
(b) 2 r* =
r = l
1-51 The expansion
«(/! + l)(2n + 1)
(a + A)» = a n + na n ~ x b + ^-— — - a"~ 2 b 2 +
(Arithmetic Progression)
(Sum of Squares)
• • + nab"' 1 + b",
and the equivalent result
( a + b)» = J I" j a r b»-<;
are called the binomial expansion. Prove the result inductively for the case
when n is a natural number.
1-52 Give an inductive proof of the results
n-l j _ r n
(a) 2 r» =
(b) 2 * 3 =
1
'«(«+!)"
(Geometric Progression)
(Sum of Cubes)
1-53 Find the general solution to the difference equation
Un + Un-i — 6u n -2 = 0.
Determine the particular solution corresponding to wi = 1, ui = 1.
1-54 Find the general solution to the difference equation
U n — 3w n -l + 2u n -2 = 0.
Determine the particular solution corresponding to wi = 3, wi = 7.
1-55 Find the solution to the difference equation
Un — 2u n -l + Un-2 =
given that ui = 2, t/2 = 3.
1-56 Find the general solution to the difference equation
Un — 6u n ~l + 9«n-2 = 0.
Determine the particular solution corresponding to Hi = 1, «2 = —3.
Variables, functions, and
mappings
2-1 Variables and functions
In the physical world the idea of one quantity depending on another is very
familiar, a typical example being provided by the observed fact that the
pressure of a fixed volume of gas depends on its temperature. This situation
is reflected in mathematics by the notion of & function, which we shall now
discuss in some detail.
The modern definition of a function in the context of real numbers is
that it is a relationship, usually a formula, by which a correspondence is
established between two sets A and B of real numbers in such a manner
that to each number in set A there corresponds only one number in set B.
The set A of numbers is the domain of the function and the set B of numbers
is the range of the function.
If the function or rule by which the correspondence between numbers in
sets A and B is established is denoted by/, and x denotes a typical number in
the domain A of/, then the number in the range B to be associated with x
by the function/is written /(x) and is read '/of x'. The numbers x and/(x)
are variables with x being given the specific name independent variable and
f(x) the name dependent variable. The independent variable is also often
called the argument of the function/.
It is often helpful to construct the graph of/ which mathematically is
the set of ordered number pairs (x,f(x)), where x belongs to the domain of/
Geometrically the graph of/ is usually represented by a plane curve, drawn
relative to an origin defined by the intersection of two perpendicular straight
lines called axes. The process of construction is as follows. A distance propor-
tional to x is measured along one axis and a distance proportional tof(x)
along the other axis. Through each resulting point on an axis is then drawn a
line parallel to the other axis and these two perpendicular lines intersect at a
unique point in the plane of the axes. This point of intersection is the point
(x,f(x)) and the graph of/ is defined to be the locus or curve formed by
joining up all such points corresponding to the domain of/ as illustrated
by Fig. 2-1.
However, it is not necessary to use axes of this type, called rectangular
Cartesian axes, and any other geometrical representation which gives unique
representation of the points (x,f(x)) would serve equally well. Thus the
axes could be inclined at an angle a ^ \n and the scale of measurement along
them need not be uniform. For example, it is often useful to plot the logarithm
42 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2
of x along the x-axis, rather than x itself. This compresses the x scale so that
large values of x may be conveniently displayed on the graph together with
small values. Another possible representation involves the use of curved
reference axes and leads to curvilinear coordinates. This will be taken up
again later in connection with conformal mapping.
Not every function can be represented in the form of an unbroken curve,
and the function
( when x is rational,
J *-' Y ' — \ 1 when x is irrational,
provides an extreme example of this situation. Here, although the graph
would look like a line parallel to the x-axis on which all points have the
value unity, in reality the infinity of points with rational x-coordinates would
be missing since they lie on the x-axis itself. The domain is all the real numbers
R and the range is just the two numbers zero and unity.
Because / transforms one set of real numbers into another set of real
numbers a function is sometimes spoken of as a transformation between
sets of real numbers. On account of the restriction to real numbers or, more
explicitly, to real variables, the function f(x) is called a. function of one real
variable. Another name that is often used for a function is a mapping of some
set of real numbers into some other set of real numbers. This name is of course
suggested by the geometrical illustration of the graph of a function and we
shall return more than once to the notion of a mapping. In this terminology,
f(x) is referred to as the image of x under the mapping/.
Since the domain and range of/ occur as intervals on the x- and j-axes,
it is convenient to use a simplified notation to identify the form of the interval
that is involved. We now adopt the almost standard notation summarized
below in which a round bracket indicates an open end of an interval, and a
square bracket indicates a closed end of an interval :
(a, b) o a < x < b,
[a, b] o a < x < b,
(a, b] o a < x < b,
[a,b)o a< x < b,
(—oo, a] o x < a,
[a, oo) o a < x,
(— oo, oo) o all x e R.
As the definition of open and closed intervals is only a matter of considering
the behaviour of the end points, we shall define the length of all the intervals
(a, b), [a,b), (a, b], and [a, b] to be the number b — a. This is consistent with
SEC 2-1
VARIABLES AND FUNCTIONS / 43
the obvious result that the length of an 'interval' comprising only one point
is zero.
Domain of/
Fig. 21 Domain, range, and graph o£f(x).
It may happen that when x lies within some interval, as for example the
interval (b, c] in Fig. 2-1, each point x is associated with a unique image point
f(x) and, conversely, each image point/(x) is associated with a unique point x.
Such a mapping or function /is then said to be one-one in the domain in
question.
However, there is another possibility that can arise and that is that in some
interval of the x-axis, more than one point x may correspond to the same
image point f(x). This is again well illustrated by Fig. 2-1 if now we consider
the interval [a, b] and the points X2 and xz, both of which have the same image
point since /(x 2 ) = f(xs). In situations such as these the mapping or function
/if said to be many-one in the domain in question.
A specific example might help here and we choose for / the function
f(x) = x 2 and the two different domains [0, 3] and [—1, 3]. A glance at
Fig. 2-2 shows that /maps the domain [0, 3] onto the range [0, 9] one-one,
but that it maps the domain [—1, 3] onto the same range [0, 9] many-one.
Expressed another way, the range [0, 1] shown as a solid line in the figure is
mapped twice by points in the domain [—1, 3]; once by points in the sub-
domain — 1 < x < and once by points in the sub-domain < x < 1 .
Again considering the domain [—1, 3], the function/(x) = x 2 maps the sub-
domain 1 < x < 3 onto the range (1, 9] one-one.
In many older books the term function is used ambiguously in that it is
sometimes applied to relationships which do not comply with our definition
44 / VARIABLES, FUNCTIONS, AND MAPPINGS
CH 2
Fig. 2-2 Example of many-one mapping in shaded range and a one-one mapping
in the hatched range.
of a function. The most familiar example of this is the 'function' y = a/x,
which fails to comply with our definition because to every positive x there
correspond two values for y, namely the positive and negative square roots of
x which are equal in magnitude but opposite in sign. A mapping of this kind
is one-many in the sense that to one value of x there correspond more than
one image point /(x), and although it is permissible to describe this relation-
ship as a mapping, it is incorrect to term it a function.
Nevertheless, the square root operation is fundamental to mathematics
and we must find some way to make it and similar ones legitimate. The
difficulty is easily resolved if we consider how the square root is used in
applications. In point of fact two different relationships are always con-
sidered which together are equivalent to y = y'x. These are yi = + \/x and
y 2 = — ^x, where the square root is always to be understood to denote the
positive square root and the sign identifies the relationship being considered.
Each of the mappings yi(x) and yi(x) of the domain (0, oo) are one-one as
Fig. 2-3 shows, so that they may each be correctly termed a function, the
particular one to be used in any application being determined by other con-
siderations, such as that the result must be positive or negative. These ideas
will arise again later in connection with inverse functions.
SEC 2-1
VARIABLES AND FUNCTIONS / 45
Fig. 2-3 The square root function.
In general, if the domain of function/is not specified then it is understood
to be the largest interval on the x-axis for which the function is defined. So if
fix) = x 2 + 4, then as this is defined for all x, the largest possible domain
must be (— oo, co). Alternatively the function /(x) = +\/(4 — x 2 ) is only
defined in terms of real numbers when — 2 < x < 2 showing that the largest
possible domain is [—2,2]. Similarly, the function f(x) = 1/(1 — x) is
defined for all x with the sole exception of x = 1 so that the largest possible
domain is the entire x-axis with the single point x = 1 deleted from it.
A function need not necessarily be defined for all real numbers on some
interval and, as in probability theory, it is quite possible for the dependent
and independent variables to assume only discrete values. Thus the rule which
assigns to any positive integer n the number of positive integers whose squares
are less than n, defines a perfectly good function. Denoting this function by
/ we have for its first few values /(l) = 0, /(2) = 1, /(3) = 1, /(4) = 1,
/(5) = 2, /(6) = 2, /(7) = 2, /(8) = 2, /(9) = 2, /(10) = 3, . . .. Clearly,
both its domain and its range are the set N of natural numbers and the
mapping is obviously many-one.
Before examining some special functions let us formulate our definition
of a function in rather more general terms. This will be useful later since
although in the above context the relationships discussed have always been
between numbers, in future we shall establish relationships between quantities
that are not simply real numbers. When we do so, it will be valuable if we
can still utilize the notion of a function. This will occur, for example, when
we establish correspondence between quantities called vectors which although
obeying algebraic laws are not themselves real numbers.
The idea of a relationship between arbitrary quantities is one which we
have already started to examine in the previous chapter in connection with
46 / VARIABLES, FUNCTIONS, AND MAPPINGS
CH 2
sets. As might be expected, set theory provides the natural language for the
formulation and expression of general ideas associated with functions, and
indeed we have already used the word 'set' quite naturally when thinking of a
set of numbers. A more general definition follows.
definition 2-1 A function f is a correspondence, often a formula, by
which each element of set A which is called the domain of/, is associated
with only one element of set B called the range of/.
To close this section we now provide a few examples illustrating some of
the ideas just mentioned.
Example 21 The function y =f(x) defined by the rule
1
/(*)
(x - l)(x - 2)
is defined for all real x with the exception of the two points x = 1 and x = 2.
The domain of/is thus the set of real numbers R with the two numbers 1 and
2 deleted. In set notation the domain is {R\{1, 2}}. The two lines x = 1 and
Fig. 2-4 Graph of y = \/(x — l)(x — 2) showing the asymptotes x = 1 and x = 2.
x = 2 shown dotted in Fig. 2-4 are called asymptotes to the graph of/ and
although the graph approaches arbitrarily close to the asymptotes it never
coincides with them.
SEC 2-1
VARIABLES AND FUNCTIONS / 47
Example 2-2 A discrete valued function may be denned by a table which is
simply an arrangement of ordered number pairs in a sequence.
Table 21
X
1
3
7
/(*)
21
4-2
10
6-3
Example 2-3 One possible system of curvilinear coordinates in the first
quadrant may be defined as follows. Using Cartesian coordinates, construct
the set of curves y = ajx and the set of straight lines y = mx, each with
domain (0, oo) and with a > 0, m > 0. Representative examples of these
curves are shown in Fig. 2-5 (a) for the stated values of a and m.
y
m = 4
m
= 1
3
2
in
5
1
«^-
i t
1
2 3
x>
(a) (b)
Fig. 2-5 (a) Families of curves y = ajx and y = mx; (b) curvilinear coordinates.
In general, any set of curves such as either of these which is derivable
from the same equation by a suitable choice of constant is called a. family of
curves, and the constant which is fixed for any one curve but which varies
from curve to curve, is called a parameter. This term parameter will often be
used in contexts which do not involve families of curves, but in every case it
will be used as here in the sense that it implies a 'variable constant'.
Next we disregard the Cartesian axes and the manner of construction of
the two families of curves and regard the two families of curves themselves
as defining new coordinate lines as in Fig. 2-5 (b). Each member of the family
of rectangular hyperbolas will then define a line along which a is constant,
no two members of the family either intersecting or having the same value of
a. Similarly, each member of the family of straight lines through origin 0,
collectively called a pencil of lines, is characterized by a different value of m.
48 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2
Apart from the single point through which all the straight lines pass and
appropriately called a singular point, there is no ambiguity as to the values of
a and m to be associated with any point in the region of the plane defined
by the two families. We shall use the quantities a and m as our new coordinates
for a point. Graphs may now be constructed using the two families of curves
as curvilinear coordinates. The intersection of a hyperbola and straight line
will define a point in the plane with coordinates given by the ordered number
pair {a, m). Thus the points A, B, and C in Fig. 2-5 (b) have curvilinear
coordinates (J, 1), (1, J), and (2, 4), respectively.
Naturally the graph of the function y = x 2 with domain (0, oo) would
look different when plotted first in Cartesian coordinates and then in these
curvilinear coordinates by setting a = x and m = y. They would however be
two different geometrical representations of the same function. Here we have
made use of the useful symbol =, which is read 'identically equal to'.
Example 2-4 This example is a final illustration of our more general defini-
tion of a function. Take as the domain of the function / the set A of all
people, and as the range B of the function/the set of all towns in the world.
Then for the function /we propose the rule that assigns to every person his
place of birth.
Clearly this function defines a many-one mapping of set A onto set B,
since although a person can only be born in one place, many other people
may have the same place of birth. This example also serves to distinguish
clearly between the concept of a 'function' which is the rule of assignment,
and the concept of the 'variables' associated with the function which here are
people and places.
2-2 Inverse functions
In the previous section we remarked that a typical example of a correspon-
dence between physical quantities was the observed fact that the pressure of a
fixed volume of gas depends on its temperature. Expressed in this form we are
implying that the dependent variable is the pressure p and the independent
variable is the temperature T, so that the law relating pressure to temperature
has the general form
P = 4>(T), (A)
where <f> is some function that is determined by experiment.
However, we know from experience that in thermodynamics it is often
necessary to interchange these roles of dependent and independent variables
and sometimes to regard the temperature T as the dependent variable and the
pressure p as the independent variable, when the temperature-pressure law
then has the form
T=y>(p), (B)
SEC 2-2
INVERSE FUNCTIONS / 49
where, naturally, the function %p is dependent on the form of the function <f>.
Indeed, formally, <f> and xp must obviously satisfy the identity <f>[y{p)] = p
for all pressures p in the domain of y.
The relationships (A) and (B) are particular cases of the notion of a
function and its inverse and the idea is successful in this context because the
correspondence between temperature and pressure is known to be one-one.
Consider a general case of a function
y=m (2-1)
that is one-one and defined on the domain [a, b], together with its inverse
x = g(y)
which has for its domain the interval [c, d] on the j-axis.
(2-2)
Fig. 2-6 (a) Ihversion through the graph of/(x); (b) inversion by reflection my = x.
Graphically the process of inversion may be accomplished point by point
as indicated in Fig. 2-6 (a). This amounts to selecting a point y in [c, d] and
then finding the corresponding point x in [a, b] by projecting horizontally
from y until the graph of/ is intercepted, after which a projection is made
vertically downwards from this intercept to identify the required point on the
x-axis.
The relationship between a function and its inverse is represented in
Fig. 2-6 (b). In this diagram we have used the fact that when a function is
represented as an ordered number pair, interchange of dependent and inde-
pendent variables corresponds to interchange of numbers in the ordered
number pair. The lower curve represents the function y = /(*) and the upper
curve represents the function y = g{x), with the function g inverse to /;
50 / VARIABLES, FUNCTIONS, AND MAPPINGS
CH 2
both graphs being plotted using the same axes. The line y = x is also shown
on the graph to emphasize that geometrically the relationship between a
one-one function and its inverse is obtained by reflecting the graph of either
function in a mirror held along the line y = x. Henceforth such a process will
simply be termed reflection in a line. Notice that when using this reflection
property to construct the graph of an inverse function from the graph of the
function itself, both functions are represented with y plotted vertically and
x plotted horizontally. This follows because the range of/ is the domain of
g, and vice versa.
No difficulty can arise in connection with a function and its inverse
because of the one-one nature of the mapping. Expressed more precisely, we
have used the obvious property illustrated by Fig. 2-6 (a) that a one-one
function/ with domain [a, b] is such that/(;ci) =/(jc 2 ) => x\ = X2 for all xi
and X2 in [a, b].
In graphical terms this result can only be true if the graph of/ either
increases or decreases steadily as x increases from a to b. When either of
these properties is true of a function then it is said to be strictly monotonic.
In particular, if a function /increases steadily as x increases from a to b, as
in Fig. 2-6 (a), then it is said to be strictly monotonic increasing and, conversely,
if it decreases steadily then it is said to be strictly monotonic decreasing.
Slightly less stringent than the condition of strict monotonicity is the
condition that a function /be just monotonic. This is the requirement that/
be either non-decreasing or non-increasing, so that it is permissible for a
function that is only monotonic to remain constant throughout some part
of its domain of definition. The adjectives increasing and decreasing are again
used to qualify the noun monotonic in the obvious manner. Representative
examples of monotonic and strictly monotonic functions, all with domain of
definition [a, b] are shown in Fig. 2-7.
Decreasing
(a) (b)
Fig. 2-7 Monotonic and strictly monotonic functions: (a) monotonic; (b) strictly
monotonic.
The example of a strictly monotonic decreasing function shown in Fig.
2-7 (b) has also been used to emphasize that a function need not be repre-
sented by an unbroken curve. The curve has a break at the single point
SEC 2-2 INVERSE FUNCTIONS / 51
x = a where it is defined to have the value y — /?. However, as the value /?
lies between the functional values on adjacent sides of x = a the function is
still strictly monotonic decreasing. Had we set /? = 0, say, then the function
would be neither strictly monotonic nor even monotonic on account of this
one point !
It is sometimes useful to relate a function and its inverse by essentially
the same symbol and this is usually accomplished by adding the superscript
minus one to the function. Thus the function inverse to/is often denoted by
f~ l which is not, of course, to be misinterpreted to mean Iff. Before examining
some important special cases of inverse functions when many-one mappings
are involved, let us formalize our previous arguments.
definition 2-2 Let the set onto which the one-one function/ with domain
[a, b] maps the set S of points be denoted by/(S). Then we define the inverse
mapping/- 1 of f(S) onto S by the requirement that f" x {y) = x if and only if
y =f(x) for all x in [a, b].
It now only remains for us to consider how some important special func-
tions such as y = x 2 , y = sin x, and y = cos x, together with other simple
trigonometric functions which are all many-one mappings, may have un-
ambiguous inverses defined.
Firstly, as we have already seen, the function y = x 2 gives a many-one
mapping of [—a, a] onto [0, a 2 ]. Here the difficulty of defining an inverse is
resolved by always taking the positive square root and defining two different
inverse functions
x = +Vy an d x = —\/y,
which are then both one-one mappings of (0, a 2 ]. The inversion must thus
be regarded as having given rise to two different functions; the one to be
selected depending on other factors as mentioned in connection with Fig.
2-3. If we recall that the domain of definition of a function forms an intrinsic
part of the definition of that function, then y = x 2 may be regarded as two
one-one mappings in accordance with the two inverses just introduced.
This is achieved by defining the many-one function y = x 2 on the domain
[—a, a] as the result of the two different one-one mappings
y = x 2 on — a <; x < and y — x 2 on < x <; a,
the difference here being only in the domains of definition. The point is
excluded from both domains since that single point maps one-one. By means
of this device we may, in general, reduce many-one mappings to a set of
one-one mappings so that the inversion problem is always straightforward.
It will suffice to discuss in detail only the inversion of the sine function,
after which a summary of the results for the other elementary trigonometric
functions will be presented in the form of a table. In general, as shown in
52 / VARIABLES, FUNCTIONS, AND MAPPINGS
CH 2
Fig. 2-8 (a), the function y = sin x maps an argument x in the set R of real
numbers onto [—1,1] many-one, but it maps any of the restricted domains
[(2h — 1)^77, (2n + 1)^7t] corresponding to integral n onto [—1, 1] one-one.
>',
1
arc sin
iff
c
'33
•jjill"^^
**l
-** • fc
-1
*.
(a)
Fig. 2-8 Principal branch of sine function : (a) principal branch of sin x giving
one-one mapping in [— £w, £»]; (b) inversion of sin x by reflection my = x.
Now in line with our approach to the inverse of the square root function,
the ambiguity as regards the function inverse to sine may be completely
resolved if we consider the many-one function y = sin x with x e R as being
replaced by an infinity of one-one functions y = sin x, with domains
[(2« — l)in, (2n + l)$n]. For then in each domain corresponding to some
integral value of n, because the mapping there is one-one, an appropriate
inverse function may be defined without difficulty.
The intervals are all of length n and are often said to define different
branches of the inverse sine function. In general, when no specific interval is
named we shall write x = Arcsin y, whenever y = sin x. The function
Arcsine thus denotes an arbitrary branch of the inverse sine function.
Because of the periodicity of the sine function, when considering the inverse
function it is only necessary to study the behaviour of one branch of Arcsine.
As is customary, we arbitrarily choose to work with the branch of the inverse
sine function associated with the domain [—frr, far], calling this the principal
branch and denoting the inverse function associated with this branch by
arcsine. Hence for the inverse we shall always write x = arcsin y when
y = sin x and — \n < x <; \n.
In Fig. 2-8 (b) is shown in relation to the line j> = x the function/ = sin x
SEC 2-3
INVERSE FUNCTIONS / 53
with domain of definition [— \n, \n\ and the associated function y = arcsinx
with domain of definition [—1, 1]. The reflection property of inverse functions
utilized in connection with Fig. 2-6 (b) is again apparent here. It should
perhaps again be emphasized that when an inverse function is obtained by
reflection in the line y = x, then in both the curves representing the function
and its inverse, the variable y is plotted as ordinate (i.e. vertically) and the
variable x as abscissa (i.e. horizontally).
Table 2-2 summarizes information concerning the most important inverse
trigonometric functions and should be studied in conjunction with Fig. 2-9.
In general the notation for a function inverse to a named trigonometric
"i*
x t = arctanj,
i in
H
K.v
y*. it
V*
V J 1
1
///
V
■ ■ . ..*■■ ■■
■- * :.■:.■■"
■ ■■■■v..
■v
■ ■ ■■■■■■• ■
'.v. ■ .
.. ■ ■ y . ' .
/
/
y
-1,
/
/
s
/
s
l\i* iH
/
*
*
1 X^ x
-1
(b)
x ^M
y = tan x
(c)
Fig. 2-9 Principal branches of inverse cosine and tangent functions : (a) principal
branch of cos x; (b) inversion of cos x by reflection in y = x; (c) principal branch of
tan x; (d) inversion of tan x by reflection in y = x.
54 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2
function is obtained by adding the prefix arc when referring to the principal
branch and Arc otherwise. In other books the convention is often to add the
superscript minus one after the named function, distinguishing the principal
branch by use of an initial capital letter when writing the function. Thus, for
example, some authors will write Sin -1 in place of arcsine and sin -1 in place
of Arcsine. Unfortunately notations are not uniform here and so when using
other books the reader would be well advised to check the notation in use.
Table 2-2 Trigonometric functions and their inverse functions
Function
Domain
Inverse function
Branch
Domain
y = sin x
[-*», W
y = arcsin x
Principal
[-1,1]
y = sin x
[(2/i - 1)K (2« + 1)M
y = Arcsin x
Any
[-1,1]
y = cos x
N>,»]
y = arccos x
Principal
[-1,1]
y = cos x
[rnr, (n + \)ir]
y = Arccos x
Any
[-1,11
y = tan x
(-*», w
y = arctan x
Principal
(—00, oo)
y = tan x
«2n - 1)K {In + l)fcr)
y = Arctan x
Any
(-00, CO)
2 -3 Some special functions
A number of special types of function occur often enough to merit some
comment. As the ideas involved in their definition are simple, a very brief
description will suffice in all but a few cases. To clarify these descriptions, the
functions are illustrated in Fig. 2- 10.
(a) Constant function
The constant function is a function y = f(x) for which f(x) is identically
equal to some constant value for all x in the domain of definition [a, b] .
Thus a constant function has the equation y = constant, for x e [a, b] .
(b) Step function
Consider some set of n sub-intervals or partitions [ao, fli), [fli, 02), [02, 03),
. . ., [a n -i, a n ] of the interval [ao, a«J. Associate n constants C\, C?, . . ., C n
with these. n sub-intervals. Then a step function defined on [ao,a n ] is the
function y =f(x) for which /(x) = C r , for all x in the rth sub-interval. The
function will be properly defined provided a functional value is assigned to
all points x in [ao, a„] including end points of the intervals. Usually it is
Fig. 2-10 (opposite) Some special functions: (a) constant function; (b) step function;
(c) y = |x| ; (d) even function; (e) odd function; (f) bounded function on [a, b].
SEC 2-3
SOME SPECIAL FUNCTIONS / 55
J
C 3
C 2
C 4
C l
t • i
1— * '
I I * 1
1 1 ' [
1 t I 1 1 »
i i ! i i ! i
i . ii i.ik
a a t a 2 a 3 a„_, a„ x'
(b)
56 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2
immaterial to which of two adjacent sub-intervals an end point is assigned
and one possible assignment is indicated in Fig. 2-10 (b), where a deleted end
point is shown as a circle and an included end point as a dot.
(c) The function fjc|
From the definition of the absolute value of x it is easily seen that the graph
of y = \x\ has the form shown in Fig. 2-10 (c). It is composed of the line
y — x for x > and the line y = — x for x < 0.
(d) Even function
An even function y = f(x) is a function for which /(—x) =/(*). The geo-
metrical implication of this definition is that the graph of an even function is
symmetrical about the j-axis so that the graph for negative x is the reflection
in the j-axis of the graph for positive x. Typical examples of even functions
are y = cos x, y = 1/(1 + x 2 ) and the function _y = |x| just defined.
(e) Odd function
An odd function y = fix) is a function for which /(—x) = —f(x). The geo-
metrical implication of this definition is that the graph of an odd function is
obtained from its graph for positive x by first reflecting the graph in the
j-axis and then reflecting the result in the x-axis. In Fig. 2-10 (e) the result of
the first reflection is shown as a dotted curve and its reflection in the x-axis
gives a second curve shown as a full line in the third quadrant which, to-
gether with the original curve in the first quadrant, defines the odd function.
By virtue of the definition we must have /(0) = — /(0), showing that the
graph of an odd function must pass through the origin. Typical odd functions
are y = sin x and y = x 3 — 3x. Most functions are neither even nor odd.
For example, y = x 3 — 3x + 1 is not even, since y(—x) = (— x) 3 — 3( — jc)
+ 1 = —x 3 + 3x + 1 =£ y(x), nor, by the same argument, is it odd, for
y(-x) # -y(x\
(f ) Bounded function
A function y =f(x) is said to be bounded on an interval if it is never
larger than some value M and never smaller than some value m for all values
of x in the interval. The numbers M and m are called, respectively, upper and
lower bounds for the function /(x) on the interval in question. It may of
course happen that only one of these conditions is true, and if it never exceeds
M then it is said to be bounded above, whereas if it is never less than m it is
said to be bounded below. A bounded function is thus a function that is
bounded both above and below. The bounds M and m need not be strict in
the sense that the function ever actually attains them. Sometimes when the
SEC 2-4 SOME SPECIAL FUNCTIONS / 57
bounds are strict they are only attained at an end point of the domain of
definition of the function.
Of all the possible upper bounds M that may be assigned to a function
that is bounded above on some interval, there will be a smallest one M', say.
Such a number M' is called the least upper bound or the supremum of the
function on the interval and the name is usually abbreviated to I.u.b. or to
sup. Similarly, of all the possible lower bounds in that may be assigned to a
function that is bounded below on some interval, there will be a largest one
m', say. Such a number tri is called the greatest lower bound or the infimum
of the function on the interval and the name is usually abbreviated to g.l.b.
or to inf.
Not all functions are bounded either above or below, as evidenced by the
function y = tan x on (—\tt, \tt), though it is bounded on any closed sub-
interval not containing either end point. Typical examples of bounded func-
tions on the interval (—00, 00) are y = sin x and y = cos x\{\ + x 2 ). The
function y = \\{x — 1) is bounded below by zero on the interval (1, 00) but
is unbounded above, whereas the function y = 2 — x 2 is strictly bounded
above by 2 but is unbounded below on the interval (—00, 00).
(g) Convex and concave functions
A convex function is one which has the property that a chord joining any
two points A and B on its graph always lies above the graph of the function
contained between those two points. Similarly, a concave function is one
which has the property that a chord joining any two points A and B on its
graph always lies below the graph of the function contained between those
two points. Thus the function y = \x\ shown in Fig. 2-10 (c) is convex on the
interval (— 00, 00) whereas the function shown in Fig. 2-10 (d) is only concave
on the closed interval [—a, a].
(h) Polynomial and rational functions
A polynomial of degree n is an algebraic expression of the form
y = a„x n + a n -ix n ~ l + • ■ ■ + a\x + a ,
where n is a positive integer and it is defined for all x.
A rational function is a function which is capable of expression as the
quotient of two polynomials and so has the form
b m x m + b m -ix™- 1 + ■ ■ ■ + b x x + b
y =
a n x n + an-ix"' 1 + • • • + aix + a
and is defined for all values of x for which the denominator does not vanish.
An example of a polynomial of degree 2 is the quadratic function
y = x 2 — 3x + 4 ; a typical rational function is
58 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2
3.x: 2 - 2.x - 1
)' =
4x 3 + llx 2 + 5.V-2
which is defined for all values of a- apart from x = —2, x = — 1, and x = J,
at which points the denominator vanishes. For this reason these values are
called the zeros of the polynomial forming the denominator and they arise
directly from its factorization into the form
4x 3 + \\x 2 + 5x -2~(4x - \)(x + 2)0 + 1).
(i) Algebraic function
An algebraic function arises when attempting to form the inverse of a rational
function. The function/ = +y/x for x > provides a typical example here.
More complicated examples are the functions:
y = v 2/3 y — x 2 _|_ 2y'x — 1 y = x V-*7(2 — x).
More precisely, we shall call the function y = f(x) algebraic if it may be
transformed into a polynomial involving the two variables x and y, the
highest powers of x and y both being greater than unity. This criterion may
easily be applied to any of the above examples. In the case of the last example,
a simple calculation soon shows that it is equivalent to the polynomial
2y 2 — 2xy 2 — x z = 0, which is of degree 2 in y and 3 in x.
(j) Transcendental function
A function is said to be transcendental if it is not algebraic. A simple example
is y = x + sin x, which is defined for all x but is obviously not algebraic.
(k) The function [x]
On occasions when working with quantities that may only assume integral
values it is useful to write y = [x] with the meaning that we assign to every
real number x the greatest integer y that is less than or equal to it. Thus, for
example, we have [-3] = -3, [-1.3] = -2, [0] =0, [0 . 92] = 0,
H = 3, and [17] = 17.
2-4 Digression on mappings
Having now examined in some detail specific examples of functions providing
one-one and many-one mappings, it will be helpful to take a slightly more
general look at the notion of a mapping. We again appeal to the Venn
diagram, but this time supplement it by the addition of arrows to suggest the
form of mapping that is involved.
In Fig. 2-11 pairs of closed curves have been used to represent the sets
A and B postulated in the formulation of the more general definition of a
function/ given in Definition 2-1. Once again points inside a curve represent
SEC 2-4
DIGRESSION ON MAPPINGS / 59
elements in the set; with set A representing the domain of the function/and
set B the range of/. The arrows relating sets A and B in the three pairs of
diagrams are then self-explanatory when taken in conjunction with the
captions.
Fig. 2-11 Mappings: (a) B = /(/)), a one-many mapping; (b) B = f(A), a many-one
mapping; (c) B —f(A), a one-one mapping.
The mappings illustrated in Fig. 2T1 are often said to be onto mappings,
in the sense that the set A is mapped by function/onto the entirety of set B.
Thus, in each case, every element in B is associated with at least one element
in A. Naturally if some set C containing B is considered in place of B, then
there will be elements of C that are not associated with any element in A. The
mapping of A into C by /is then said to be an into mapping.
For example, if the function concerned is y = x 2 , then it maps the set A
comprising the interval [1, 2] into the set C comprising the interval [1,9],
but onto the set B comprising the interval [1,4].
These ideas are of real importance when a double mapping is involved,
for then it is necessary to examine the relationship that exists between the
range of the first function and the domain of definition of the second. If the
first mapping is by a function /and the second mapping is by a function g,
then the result of the successive mappings is called the composition of /and g
and is usually denoted by f g. The order implies that/is the first mapping
which is then followed by g. Using perhaps more familiar terminology and
notation we are speaking here of the 'function of a function' g{f(x)}.
The general ideas involved here are illustrated in Fig. 2-12. There (a) and
(b) indicate the respective domains and ranges of/ and g whilst (c) indicates
how, in general, the function f g has for its domain only part of A and for
its range only part of B.
60 / VARIABLES, FUNCTIONS, AND MAPPINGS
CH 2
Domain off
(a)
Domain of fog
Range off Domain of g
(b)
GfSD
/w
(c)
Elements common to range
of/ and domain of g
Range of g
g{Aa)}
Fig. 212 (a) Mapping by /of A onto B; (b) mapping by g of C onto D; (c) com-
position of/ g.
The symbolic representation suggested in Fig. 212 can be made more
meaningful by considering the following. Let f(x) = 3x + 1 with domain
(- co, 4/3] and g(x) = + V(9 - x) with domain [1, 9]. Then the range off
is (— co, 5] and the range of g is [0, 2V2]. The range of/ thus only coincides
with the domain of g in the interval [1, 5]. Hence the part of the domain of g
that is common to the range off is a one-one mapping by /of the interval
[0, 4/3]. This interval must then be the domain of f g. Next, the function g
maps [1,5] onto the interval [2, 2-\/2], which must be the range of f g.
Thus we have obtained the following:
Domain off:
Range off:
Domain of g:
Range of g:
(-00,4/3],
(-oo,5],
[1,9],
[0, 2V2],
Domain of f og : [0,4/3],
Range of f og : [2,2^2].
SEC 2-5
CURVES AND PARAMETERS / 61
Using direct algebraic substitution we see that in fact if/(.v) = 3x + 1 and
g(x) = + V(9 - x\ then/ g = g{f(x)} = y/[9 - (3x + 1)] = + V(8 - 3x).
This confirms directly that f g maps [0,4/3] onto [2, 2\/2], but does not
take explicit account of the effect of the domain of g on the mapping.
2 - 5 Curves and parameters
A parameter a may be associated with a curve in two quite different ways.
In the first situation we shall discuss, the parameter a occurs as a constant
in the equation describing the curve. Thus changing the value of a will change
the curve that is described. This simple idea underlies the geometrical concept
of an envelope, which will be taken up again later in connection with differen-
tiation and with differential equations.
In the second situation, a will appear as a variable associated with two
functions s(ct.) and ?(a), which will describe separately the x and y coordinates
of points on any unbroken curve. This use of a parameter is called the
parameterization of a curve and is an alternative method of representing the
equation of the curve.
(a) Envelopes
This situation is best explained by means of an example. Considerthe equation
(x — a) 2 + y 2 = -
+ a 2
which in this form is easily seen to describe a circle of radius |a|/-v/(l + a 2 )
with its centre on the x-axis at the point x = a. Obviously, changing a will
both move the centre of the circle and alter its radius, as shown in Fig. 2T3.
Fig. 2- 13 Envelope shown as dotted line.
62 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2
If a is allowed to vary in some interval, then the single equation will
describe a set of circles, each one corresponding to a different value assumed
by a in that interval. Collectively these circles are a family of circles with
parameter a. If a curve exists that is tangent to every member of a family of
curves, but is not itself a member of the family, then it is an envelope of the
family. An envelope can be a curve of infinite length or on occasions it may
reduce either to a curve of finite length or, in degenerate cases, to a single
point.
In Fig. 2-13 the envelope is shown as a dotted curve and, as would be
expected in this case, the envelope is symmetrical about both the x- and j>-axes.
If the family of circles that led to this envelope is written in the form
(x - a) 2 + y 2 - ^— t = 0,
1 + v. 2
then it is seen to be a special case of an equation in three variables having the
general form
f(x,y,x) = 0. (2-3)
This is the standard form for an equation defining a family of curves with
parameter a and it will be used later to determine the equation of the envelope
when it exists.
However, it is easy to see that a family of curves does not always have an
envelope associated with it, since the concentric circles x 2 + y 2 = a 2 form a
perfectly good family with parameter a, but clearly there is no line that is
tangent to each circle in the family.
Expression (23) is an implicit representation of a function in the sense
that it is not directly obvious how and when it is possible to re-express it in
the more familiar explicit form
y = F(x, a). (2-4)
(b) Parameterization of a curve
We have seen that when a curve is represented by an explicit equation of the
form y = /(x), then for inversion reasons the mapping must be one-one.
In other words, either/must be strictly monotonic in its domain of definition
or, if not, it must be expressible piecewise as a set of new functions which are
strictly monotonic on suitably chosen domains.
A more general representation of a curve that overcomes the necessity
for sub-division of the domain, and even allows curves with loops, may be
achieved by the introduction of the notion of parametric representation of a
curve. The idea here is simple and is that instead of considering x and y
to be directly related by some function/, we instead consider x and y separ-
ately to be functions of the variable parameter a. Thus we arrive at the pair of
equations
SEC 2-5
CURVES AND PARAMETERS / 63
x = s (a.) y = /(a), (2-5)
with a < a < 6, say, which together define a curve. For any value of a in
[a, b] we can use these equations to determine unique values of x and y,
and hence to plot a single point on the curve represented parametrically by
Eqn (2-5). The set of all points described by Eqn (2-5) then defines a curve.
As a simple example of a curve without loops we may consider the
parametric equations
y = a 2
for — oo < a < oo.
These obviously define a parabola that lies in the upper half plane and is
symmetrical about the j-axis with its vertex passing through the origin.
Elimination of a is easy here and results in the explicit representation y = x 2 .
In more complicated cases the parameter cannot usually be eliminated and,
indeed, this should not be expected since parametric representation is more
general than explicit representation.
An important consequence of the parametric representation of a curve is
that increasing the value of the parameter defines a sense of direction along
the curve which is often very useful in more advanced applications of these
ideas. An example of a curve containing a loop is provided by the parametric
equations
y3 _
for
y = 4 - a 2
which is shown in Fig. 2-14 together with the sense of direction defined by
increasing a.
"y
-6
1!
2.
Fig. 2- 14 Parameterization of a curve denning sense of direction.
64 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2
It is implicit in the concept of the parametric representation of a curve
that a given curve may be parameterized in more than one way.- Hence
changing the variable in a parameterization will give a different parametric
representation of the same curve. Thus if in the example above we replace
the parameter a by the parameter ft using the relationship <x = /? + 1, then
it is readily seen that
x = £3 + 3/S 2 + 2$ y = 3 -2^-/32 for-3<£^l.
This is an alternative parameterization of the same curve shown in Fig. 2-14.
2-6 Functions of several real variables
In physical situations, to say that a quantity depends only on one other
quantity is usually a gross oversimplification. Indeed, this was so in the
thermodynamic illustration used to introduce the notion of a function of one
real variable, because we insisted on maintaining a constant volume of gas.
In general the pressure/) of a given gas will depend on both its temperature T
and its volume v. Here we would say that there was a functional relationship
between p, T, and v which, in an implicit form, may be expressed by the
equation
f(p, T, v) = 0. (2-6)
The function/occurring here is a. function of three real variables and obviously
depends for its form on the particular gas involved.
Usually one of the three quantities, say p, is regarded as a dependent
variable with the others, namely T and v, being regarded as independent
variables. Solving Eqn (2-6) forp then gives rise to an explicit expression of
the form
P = g(T, v), (2-7)
with g then being called a function of two real variables.
Just as with a function of a single real variable, in addition to specifying
the functional form it is also necessary to stipulate the domain of definition
of the function. Thus Eqn (2-7), which in thermodynamic terms would be
called the equation of state of the gas, would only be valid for some range of
temperature and volume. In this case the reason for the restriction on the
temperature and volume is a physical one, whereas in other situations it is
likely to be a purely mathematical one.
Extending the ideas already introduced we shall now let R 2 denote the
set of all ordered pairs (x, y) of real numbers and let S be some subset of R 2 .
definition 2-3 We say /is a real valued function of the real variables
x and y defined in set S if, for every (x, y) e S, there is defined a real number
denoted by f(x,y).
SEC 2-6 FUNCTIONS OF SEVERAL REAL VARIABLES / 65
As is the case with a function of one variable, when the domain of defini-
tion of a real valued function of two or more real variables is not specified
it is to be understood to be the largest possible domain of definition that can
be defined. Thus, for example, the largest subset S <= R2 in which the function
f(x,y) = \/(l — x 2 — y 2 ) is defined is given by
S = {(x,y)sR 2 \x 2 +y 2 < 1}.
This concept of a function immediately extends to include functions of
more than two variables. Using R" to denote the set of all ordered w-tuples
(xi, X2, ■ ■ ., x n ) of real numbers of which S is some subset, this definition
can be formulated.
definition 2-4 We shall say that f is a real valued function of the real
variables xi, X2, . . ., x n defined in set S if, for every (xi, xz, . . ., x n ) e S,
there is defined a real number denoted byf(xi, xz, . . ., x n ).
A typical example of a function of the three variables x, y, z is provided
by f(x,y, z) = V(2 - x) + V(9 - y 2 ) + V(16 - z 4 ). The largest subset
S <= R3 for which this function may be defined is obviously
S = {(jc,j,z)eR3|x<2; -3<j<3; -2<z<2}.
The geometrical idea underlying the graph of a function of a single variable
also extends to real functions /of two real variables x, y. Denote the value of
the function/at (x, y) by z, so that we may write z = f(x, y). Then with each
point of the (x, j)-plane at which / is defined we have associated a third
number z = f(x, y). Taking three mutually perpendicular straight lines with
a common origin as axes, we may then identify two of the axes with the
independent variables x and y and the third with the dependent variable z.
The ordered number triples (x,y, z) = (x,y,f(x,y)) may then be plotted as
points in a three-dimensional geometrical space. The set of points (x, y, z)
corresponding to the domain of definition of the function/(x, y, z) then define
a surface which, in practice, usually turns out to be smooth. It is conventional
to plot z vertically.
On account of the geometrical representation just described, even in R n
it is customary to speak of the ordered «-tuple of numbers (xi, X2, ■ ■ ., x n )
as defining a 'point' in the 'space' R™.
By way of illustration of a graph of a function of two variables we now
consider
x 2 v 2 x 2 v 2
A*.y) = -4+j with -4+j^ 2 >
where the inequality serves to define a domain of definition for the function.
The surface described by this function has the equation z = x 2 /4 + y 2 /9
and the domain of definition is the interior and boundary of the curve
Cross-section by plane x = b
Cross-section by plane y = a
Cross section by z = 1
Fig. 2- 15 Surfaces and level curves: (a) representation of surface; (b) level curves.
PROBLEMS / 67
x 2 /4 + y 2 /9 = 2. If this latter expression is rewritten in the form
x 2 /8 + j 2 / 18 = 1 then it can be seen that the domain of definition of/ is in
fact the interior of an ellipse in the (x, j)-plane having semi-minor axis
2V2 and semi-major axis 3-\/2, and being centred on the origin. As f(x,y) is
an essentially positive quantity it follows directly that < z < 2 in the domain
of/.
To deduce the form of the surface, two further geometrical concepts are
helpful. The first is the notion of the curve defined by taking a cross-section
of the surface parallel to the z-axis. The second is the notion of a contour
line or level curve, defined by taking a cross-section of the surface perpendicular
to the z-axis.
To examine a cross-section of the surface by the plane y = a, say, we
need only set y = a in f(x,y) to obtain z = x 2 /4 + a 2 /9, showing that the
curve so defined is a parabola with vertex at a height z = a 2 /9 above the
j-axis. A similar cross-section by the plane x = b shows that the curve so
defined is z = b 2 /4 + _y 2 /9, which is also a parabola, but this time with its
vertex at a height z = b 2 /4 above the x-axis. (See Fig. 2-15 (a).) If desired,
sections by other planes parallel to the z-axis may also be used to assist
visualization of the surface.
The curve defined by a section of the surface resulting from a cross-section
taken perpendicular to the z-axis is called a contour line or level curve by
direct analogy with cartography, where such lines are drawn on a map to
show contours of constant altitude. Level curves are obtained by determining
the curves in the (x, jO-plane for which z = constant, and it is customary to
draw them all on one graph in the (x, j)-plane with the appropriate value of
z shown against each curve. (See Fig. 2T5 (b).)
Let us determine the level curve in our example corresponding to z = \
which is representative of z in the range < z < 2. We must thus find the
curve with the equation x 2 /4 + j 2 /9 = J, which we choose to rewrite in the
standard form x 2 /2 + y 2 1(9/2) = 1. This shows that it describes an ellipse
centred on the origin with semi-minor axis \/2 and semi-major axis "Sy/2.
It is not difficult to see that all the level curves are ellipses; the one corres-
ponding to z = 2 being the boundary of the domain of/ and the one corres-
ponding to z = degenerating to the single point at the origin.
PROBLEMS
Section 2-1
21 Sketch the graphs of these functions:
(a) f{x) = x 2 ~ 3x + 2 (-l<x<3);
(b) f(x) = x + sin x (- -n/2 < x < w/2);
(c) /(*) = x 3 (-2<*<2);
(d) fix) = x 2 + 1/x (0-2<x<2);
(e) fix) = x + 1/x 2 (0-5 < x < 5).
68 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2
2-2 Determine the domain and the range of each of functions (a) to (e) denned
above.
2-3 Determine the range of the function /(x) = x 2 + 1 corresponding to each of
the following domains and state when the mapping is one-one and when it is
many-one:
(a) [-1,1]; (b) (2,4); (c) [-2,4]; (d) [-3,1].
2-4 Find the largest domain of definition for each of the following functions :
(a) /(*) = x 3 + 3 ; (b) /(*) = x 2 + VO - x*) ;
(c) fix) = x 2 + Vd - * 3 ); (d) /(*) = l/(* 2 - 1);
(e) fix) = * + 1/x; (0 fix) = x 2 /(l + x 2 ).
2-5 Let fin) denote the function that assigns to any positive integer n the number of
positive integers whose square is less than or equal to n + 2. By enumerating
the first few values of fin) deduce the values of n for which /(«) = 3.
2-6 An integer m is said to be a prime number if its only factors are 1 and m.
Given that/(«) is the function that associates with n the number of primes less
than or equal to 2w + 1, enumerate the first ten values of fin).
2-7 Give two examples of functions which are defined only for discrete values of
the dependent and independent variables.
2-8 Sketch representative members of the two pencils of lines described by
y = a (x — 1) + 2 and y = Pix — 2) + 3, where a and p are parameters.
Locate the two singular points and suggest how a and p may be used as co-
ordinates for points in the plane of the two pencils. When will the coordinates
a and p fail to identify points ?
2-9 Suppose that /is the function that assigns to every qualified driver the name of
the driving examiner who issued his licence. Identify the domain A and the
range B off, stating the nature of the mapping involved.
2-10 Give two examples of functions relating non-numerical quantities.
Section 2-2
211 Sketch the graphs of the following functions in their stated domains of
definition and in each case use the process of reflection in the line y — x to
construct the graph of the inverse function :
(a) fix) = x 3 with x e [-2, 2];
(b) f{x) = x + sin x with x e [0, w/2] ;
(c) f(x) = x/(l + x 2 ) with x e {- 1, 2].
2-12 Where appropriate, classify the following functions as either monotonic or
strictly monotonic increasing or decreasing on the stated domains of defini-
tion:
i&) fix) = x 2 for xe[-l,2\;
(b) fix) = x 2 forxe[-l,0);
(c) fix) = sin x for x e [-3 jt/4, w/4] ;
(d) fix) = cos x for x e [0, *] ;
(e) fix) = tan x for x e [- w/4, tt/4] ;
txfor xe[0, 1]
if) fix) = lforxe(l,2]
U 2 /4 for x e (2, 6];
PROBLEMS / 69
,,,,.. txforxe [1,2)
(g)/W = | x 2 forxe [2,4].
2-13 Complete the entries in this table:
f , _. Is mapping f' 1 when it
■^ ' one-one exists
X
[-3,1]
X 3
1/(1 + X)
[1,3]
sin x
[-K W
cos (a: + Jn)
[O.ir]
tan [x — \tt\
[0, \n\
(2,4]
Section 23
2-14 Sketch these functions in their associated domains of definition:
(a)/(x) = |2x|forxe[-2,2];
(b) f(x) = x + | x | for x £ [-2, 2];
(c) the step function assuming the values 1, 2, —3, 2, 4 on the x intervals
[0, 1), [1, 2], (2, 3-5), [3-5, 4], and (4, 5], respectively. Identify end points
belonging to a line by a dot and end points deleted from a line by a circle.
x | for x 6 [0, 1)
x- 1 | for xe [1,2)
x-2|forxe[2,3].
(d) fix) =
2-15 Where appropriate, classify the following functions as even or odd:
(a)/(x) = x + |oc|;
(b) /(x) = x + sin 2x;
(c) f[x) = x 2 + sin x;
(d) /(x) = 1/x;
(e)/(x) = x 2 /(l+x 2 ) 2 ;
(f) /(*) = x 5 - x 3 + x;
(g) fix) = 2 cos x + sin x.
It is obvious that any arbitrary function /(x) which is defined in an interval
^ containing the origin may be written in the form
fix) = iifix) +fi~x)) + HJ{x) -fi-x)),
in any interval,/ <= J that is symmetric about the origin. Such an interval,/"
is said to be interior ioJ. This shows that any such/(x) is expressible as the
sum of an even function K/(x) + /( - *)), and an odd function £(/(*) - /( - *))
within ^ Apply this result to display the following functions as the sum of
even and odd parts, in each case stating the largest interval ,/ for which the
result is true:
(h) f(x) = 1 + x 3 + x siri x for — 2^ < x < 3*-;
(i) fix) = 1 + x + | x | sin x for — 3*- < x < 3»;
(j) /(*) = 1 - x + 2x 2 + 4x 3 for -4 < x < 3.
2-16 Determine if upper and lower bounds exist for the following functions and,
when appropriate, state their values and where they occur on the respective
domains of definition:
70-/ VARIABLES, FUNCTIONS, AND MAPPINGS CH 2
(a)/(x) = l/xforxe[l,4];
(b)/(x)= l/xforjce(0,3];
(c)/(x)= 1 + x 2 for xe [-2,1];
(d) f(x) = sin x for x e [0, 3 tt/2] ;
(e) /(x) = tan x for x e (- n/2, tt/2).
2-17 The pairs of numbers enclosed by the curly brackets following each problem
are upper and lower bounds for the associated function in its stated domain
of definition. State whether or not each of these bounds is strict :
(a) /fr) = x 3 + x + 1 with x e [1, 2], {0, 11};
(b) f{x) = sin x with x e [0, tt/2], {0, 2};
(c) f(x) = 1/(1 + x 2 ) with x e [0, 2], {1/6, 2};
(d) /(x) = sin (1/*) with x e [21*, 30], {0, 1}.
2- 18 Determine by sketching whether the following functions are convex, concave
or neither on their stated domains of definition :
(a) f{x) = x 3 forxe[l,3];
(b)f(x) = x 3 forxe[-l, 1];
(c) f(x) = a 2 - x 2 for x e [-a/2, a];
(d) /(x) = x + sin x for x e [0, tt/2] ;
(e) f(x) = sin * for x e [0, w].
2-19 Give examples of polynomials of degrees 3, 4, and 5 and of a rational function
having a numerator of degree 2 and a denominator of degree 5.
2-20 Classify the following functions as polynomial, rational, or algebraic. When
the function is algebraic, state the degrees of x and y in the polynomial that is
involved after the surds and fractions have been cleared :
(a) y = x 3 — x 2 + 1 ;
(b) j = xV(3-x);
(c) y = {x - l)/(*4 + 3x 3 - x 2 + x + 1);
(d) y = x + 3V{x 2 - 2);
(e) y = (x 3 - 3x + 2)/(x - 1).
Section 2-4
2-21 Complete the entries in the following table by determining whether the
functions /map the stated domains A 'into' or 'onto' the domains B.
r a d Into or onto
mapping
X 3
[1,3]
[0, 30]
x + sin x
[0, in]
[0, *(2 + »)]
X 2
0,4]
[1,16]
x i
[-1,2]
[0, 16]
2-22 Given that f(x) = 2x - 7 with domain (- oo, 20] and g(x) = 10 - x with
domain [—6, oo), determine the domain and the range of the composition
f°s-
2-23 Given that/(x) = x-+ 1 with domain (-oo, oo) and g(x) = 2 + V(4 — x)
with domain [—5, 4], determine the domain and the range of the composition
f°g-
PROBLEMS / 71
Section 2-5
2-24 Draw the circles corresponding to a = |, J, \, 1, and 2 in the equation
(x — l) 2 + (y — a) 2 = a 2 /(l + a 2 ) and sketch the envelope indicating its
asymptotes for large positive and negative a.
2-25 Draw the circles corresponding to a = J, J, 1, 2, and 3 in the equation
(x — a) 2 + j 2 = I a 2 and draw the envelope.
2-26 Deduce the envelope of the family of circles (x — a) 2 + y 2 = a 2 , with
parameter a.
2-27 Sketch representative ellipses belonging to the family * 2 /a 2 + j 2 /(4 — a) 2 = 1 ,
with parameter a and deduce the shape of the envelope.
2-28 Draw representative members of the family of straight lines y = <xx + 2/a,
with parameter a, and deduce the shape of the envelope.
2-29 Sketch the curve represented by the parametric equations x = 2 cos a,
y = sin a for — n/2 < a < n/2.
2-30 Sketch the curve represented by the parametric equations x = a 2 + 1, y = a 3
for — 2 < a < 2.
2-31 Sketch the curve represented by the parametric equations x = a 3 + a 2 — 2a,
y = 5 — a 2 for — 3 < a < 2. Indicate by arrows on the curve the sense of
direction corresponding to increasing a.
2-32 Sketch the curve represented by the parametric equations x = cos a +
4 cos (a/3), y = sin a + 4 sin (a/3) for < a < 3 nj2. Use arguments in-
volving even and odd functions to deduce the form taken by the curve for
< a < 6tt.
2-33 Suggest two different parametric representations for the curve y = x 2 + x + 1
for < x < 2.
Section 2-6
2-34 What are the largest domains of definition for the following functions of
several variables:
(a)f(x,y) = 1 +x 2 + y 2 ;
(b) f(x,y) = (x 2 + F W0 - x 2 - y 2 );
(c) f{x, y) = sin xyj(x 2 + y 2 + 1);
(d) f(x, y) = 3* 2 + y 2 + V(2 -y)+ V(4 - x 2 );
(e) f(x,y, z) = V(3 - x) + xV(9 - y) + yV(l - z 2 );
(f) f(x,y, z) = V(x 2 + y 2 - 1) + V(4 - x 2 - y 2 - z 2 ).
2-35 The function f{x, y) = x 2 y has for its domain of definition the rectangle in
the {x, j)-plane defined by | x \ < 3, | y \ < 2. Deduce the shape of the curves
defined by cross-sections of the surface z = f{x, y) taken by the three planes
x = -2, x = 0, and x = 2 that are parallel to the (y, z)-axes and by the three
planes y = —2,y = 0, and y = 2 that are parallel to the (x, z)-axes, using
your results to sketch the surface. Sketch on one diagram the level curves
corresponding to z = — 4, z = — 2, z = 0, and z = 6.
2-36 Sketch the surface z = f(x,y) defined by the function f(x,y) = 1/(1 + x 2
+ y 2 ) in the domain | x j < 4, | y | < 4. Draw the level curves corresponding
to z = 1/9, z = 1/3, z = 2/3, and z = 1.
72 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2
2-37 The surface z = f(x, y) is defined by the function f(x, y) = \j[{x — l) 2
+ (y- 2) 2 - 1] with 2 < (x - l) 2 + {y - 2) 2 < 9. Deduce the domain of
definition of the function and then sketch the level curves corresponding to
z = i, 2 = i, and z = f on the same diagram. Use your result to sketch the
surface. [Hint: Use the fact that the circle of radius p with centre at (a, b) has
the equation (x — a) 2 + {y — b) 2 = p 2 .]
Sequences, limits, and
continuity
3-1 Sequences
The notion of a 'sequence' is a constantly recurring one in everyday life,
where it usually implies the ordering of some set of events with respect to
time. The sets of events that are so ordered, or arranged, are very varied and
may be either numerical or non-numerical in nature. Typical examples of
commonplace sequences in these categories are these:
(a) the sequence of months in a year ;
(b) the sequence of digits identifying a telephone subscriber;
(c) the sequence of machining operations required to make a certain
component.
However, sequences are not necessarily decided by the chronological
order of events and they are often determined instead by some attribute
possessed by the members of the set to be ordered. Thus, for example, two
commonly occurring sequences to be found in any library are the entries in the
alphabetic catalogues of authors and titles, neither of which are in the
chronological order of acquisition of the books. Although these general ideas
could be discussed at greater length, such an examination is inappropriate
here, and it must suffice that these few examples show that sequences are
commonplace in the world around us, and that they need not necessarily
involve numbers.
These ideas find an immediate parallel in mathematics, where the natural
order existing in R combined with the arithmetic properties discussed in
Chapter 1 enables us to deal very successfully and in great detail with ques-
tions relating to mathematical sequences. Our main pre-occupation in this
book will be with sequences of numbers and sequences of functions so we
must first make the mathematical notion of a sequence more precise. Before
doing this however we must first issue a word of warning concerning the
colloquial usage of the words sequence and series, and on their mathematical
usage which is quite different. Colloquially the words sequence and series are
often used interchangeably, but in mathematics they have two quite different
meanings which must never be confused. In brief, in mathematical terms a
sequence is a set of quantities that is enumerated in a definite order, whereas a
series involves the sum of a set of quantities. Thus 1, 3, 5, 7, 9, . . . is a
sequence but 1 + \ + i + i + re + • • • is a series.
If a sequence is composed of elements or terms u belonging to some set S,
74 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
then it is conventional to indicate their order by adding a numerical suffix
to each term. Consecutive terms in the sequence are usually numbered
sequentially, starting from unity, so that the first few terms of a sequence
involving u would be denoted by m, u 2 , u 3 , . . .. Rather than write out a
number of terms in this manner this sequence is often represented by {u n },
where u„ is the nth term of the sequence. The sequence depends on the set
chosen for S and the way suffixes are allocated to elements of S. A sequence
will be said to be infinite or finite according as the number of terms it contains
is infinite or finite and, unless explicitly stated, all sequences will be assumed
to be infinite. The notation for a sequence is often modified to {u n }% =1 when
only a finite number N of terms is involved.
As an example of an infinite numerical sequence, let S be the set of real
numbers and the rule by which suffixes are allocated be that to each integer
suffix n we allocate the number 1/2" which belongs to R. We thus arrive at the
infinite sequence u\ = 1/2, w 2 = 1/2 2 , w 3 = 1/2 3 , . . ., which could either be
written in the form
111 11
2 2 2 2 3 1 n 2 n+1
or, more concisely, in the form
{l/2»}.
Had the set S still been the set R of real numbers, but the rule of allocation
of suffixes been changed, so that to each integer suffix n chosen from the first
N natural numbers we allocated the number l/(2« + 1), then the finite
sequence
111 1
3 5 7 (2JV + 1)
would have resulted.
If we use the notion of a function /(x) which is defined only for integral
values of the argument x, the following concise definition can be formulated.
definition 3-1 In mathematical terms a sequence is a function / defined
only for integer values of its argument and having for its range an arbitrary
sets'.
Hence the first sequence that was displayed could be regarded as resulting
from the function f(x) = \J2 X with u n =/(«), where n is always a positive
integer. By exactly similar reasoning, the second sequence can be derived
from the function f(x) = 1/(2* + 1) by setting u n = /(«)•
The connection between functions and sequences that is established in
this definition makes it appropriate to describe numerical sequences in the
same terms as would be used to describe the function giving rise to them.
SEC 3-1 SEQUENCES / 75
Thus if the terms of a sequence {«„} are such that m <u n < M for all values
of n then the sequence is said to be bounded, whilst if u n +\ > u n for all «
then the sequence is said to be strictly monotonic increasing. The terms bounded
above, bounded below, unbounded, strictly monotonic decreasing, monotonic,
and oscillating, etc., can also be used in the obvious manner as shown below.
Example 3-1
(a) {l/n}f is a bounded, strictly monotonic decreasing sequence.
The upper bound 1 is strict but the lower bound is
never actually attained.
(b) ( 1 \°° is a strictly monotonic increasing sequence, strictly
\sin (\ln)] 1 bounded below by (sin l)" 1 but unbounded above.
(c) /(— 1)»)°° is a bounded sequence with strict upper bound J and
\ ~ n J strict lower bound —1.
(d) {«„}" where W2m-i = m\{tn + 1) and uzm = «2m-i- The first
six terms of this sequence are |, f, f, §, f, | correspond-
ing pairwise, respectively, to m = 1, 2, and 3. The
sequence is thus both bounded and monotonic in-
creasing. It is not strictly monotonic increasing because
pairs of terms are equal. The lower bound \ is strict,
but the upper bound 1 is never actually attained.
(e) {(— l) n } is an oscillating but bounded sequence with strict
upper bound 1 and strict lower bound — 1 .
(f) {(—2)"} is an oscillating but unbounded sequence.
Just as a graph proved to be useful when representing functions, so also
may it be used to represent sequences. Exactly the same method of repre-
sentation can be adopted, but this time, since the domain of the function
denning the sequence is the set of natural numbers, the graph of a sequence
will be a set of isolated points. A typical example is the graph of the first
few terms of the sequence {u n } with u n — [n + (— 1)»]/« which are shown as
dots in Fig. 3-1 (a).
An obvious deficiency of this representation is that the horizontal axis
must be made unreasonably long if a large number of terms are to be repre-
sented. This can be overcome by the following simple device which is some-
times of use since it compresses the' representation of numbers 1 to infinity
onto a line of finite length. The idea is illustrated in Fig. 3-1 (b) where, on the
horizontal axis, the integer n is associated with a point distant \jn to the left
of a fixed point P. The left end point of the line segment is then associated
with the value 1, the mid-point with the value 2, and so on, with the point P
itself corresponding to an infinite value of n.
An even simpler graphical representation than either of these is often
76 / SEQUENCES, LIMITS, AND CONTINUITY
CH 3
used in which the values of successive terms in the sequence are plotted one-
dimensionally as points on a straight line relative to some fixed origin.
Because of the identification of the numerical value of a term of the sequence
with a point on a line, the behaviour of a sequence is often spoken of in terms
of the behaviour of the points in this representation (that is-, there is a one-one
mapping of {u„} onto the straight line). In terms of this representation, the same
1-5-1
10-
0-5- !
1 2 3 4 5 6 7
1-5-1
10
0-5
\l-<
1/2
1/3
1/4-
1/5
-1/6 —
Fig. 3-1 Two alternative graphs of sequence jl H j: (a) normal graph;
(b) compressed horizontal axis.
SEC 3-1 SEQUENCES / 77
All points u, for n > 5
lie in this neighbourhood
u
M 3 " 5 l u i M» " 6 \u
t • • { ■ • • ••^••••» • •}«
0-5 a \ i-o k
> • }• •
6 : 5 i,\ T¥' 4 1-5
/. , (-1)"
Fig. 3-2 Sequence 1 + j plotted on line
sequence that gave rise to Fig. 3-1 (a) and (b) will appear as in Fig. 3-2. This
could also have been obtained from Fig. 3-1 (a) and (b) by projecting the
points of the graphs horizontally across to meet the vertical axis.
In each of these three representations, the tendency for the points of the
sequence {1 + (— l)"/w} to cluster around the value unity as n increases is
obvious and clearly expresses an important property possessed by the sequence.
We shall now explore this more fully.
In the sequence just discussed it is obvious that as n increases, so the
points of the sequence cluster ever closer to the unit point in Fig. 3-2. If we
adopt the convention of calling an open interval (a, b) containing some fixed
point a neighbourhood of that point, then it is not difficult to see that any
neighbourhood of the point unity will contain an infinite number of points
of the sequence {u n }. In fact in this case we can assert that no matter how small
the length b — a of the neighbourhood, there will always be an infinite number
of points in (a, b) and there will always be a finite number of points outside
(a, b). This is even true when b — a shrinks virtually to zero!
The fact that any neighbourhood of the value unity has the property that
an infinite number of points of the sequence are contained within it, whereas
only a finite number of points lie without it, is recognized by saying that the
limit of the sequence is unity. On account of this name the point corresponding
to the value unity in Fig. 3-2 is called a limit point of the sequence. We shall
examine the idea of a limit in the next section, and so for the moment will
confine discussion to limit points. For this we shall require the notion of a
sub-sequence. Henceforth, by a sub-sequence we shall mean a sequence
u ni , w„ 2 , . . ., u nm , . . ., of terms belonging to the sequence {w„}, where
mi, «2, . . ., n m , ... is some numerically ordered set of integers selected
from the complete set of natural numbers. Thus ui, 1/9, H27, «3i, . . • is a sub-
sequence of «i, «2, «3, • • • and obviously {u% ug, W27, M31, • • .} c {«»}•
In terms of this we now give the following formal definition of a limit
point of a sequence {««}.
definition 3-2 A point u* is said to be a limit point of the sequence {u„}
if every neighbourhood of u* contains an infinite number of elements of
the sequence {u n }.
78 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
Since we have not insisted that there be a finite number of points outside any
neighbourhood of a limit point it follows that a sequence may have more than
one limit point. We shall show by example that a limit point may or may not
be a member of the sequence that defines it. This result when applied to
sequences with only one limit point will later be seen to be very important,
since it provides the justification for the approximation to irrational numbers
in calculations by rational numbers. In sequences involving only one limit
point the sequence will be said to converge to the value associated with the
limit point. This value will be called the limit of the sequence.
Not all sequences have limit points and the following examples exhibit
sequences having three, one, and no limit points, respectively.
{ sin C-^M
Example 3-2
(a) j c . n ( n 2 + l \_\ has the three limit points — 1, 0, and 1, of which
is a member of the sequence and the other two are
not. The sequence does not converge.
(b) f 1 . (™\\ has only one limit point at zero which is a member
(n \ 2 // of the sequence. The sequence converges to zero.
(c) {« 2 } has no limit point and so the sequence does not
converge.
One of the most important applications of the notion of a sequence is to
the study of series. The difficulty here is to give a meaning to the sum of an
infinite number of terms. What, for example, is the meaning of
v l
2-,- (A)
The solution is to be found in the behaviour of the sequence {s m } defined by
m ]
1 «!
The first few terms of the sequence {s m } are
s 1 = U *-l+j|. *-l + i + l. ,, = 1 + 1 + 1 + 1
and obviously all such terms s m will only involve the sum of a finite number
of numbers. For obvious reasons s m is called the mth partial sum of the series
(A). The interpretation of the infinite sum (A) is to be found in the behaviour
of the Mh term of {s m }, namely the Mh partial sum sn, as N tends to infinity.
If {s m } has only one limit point at which s m tends to some number S, then this
will be called the sum of the series. If S is infinite the series will be said to
SEC 3-2 LIMITS OF SEQUENCES / 79
diverge. A moment's reflection will show the reader that this is the practical
approach to the problem, since the term s# is the sum of the first N terms of
the infinite series (A), and it seems reasonable to assume that when the value
of (A) is finite, it must be close to the value sn, when N is suitably large.
These preliminary ideas on series must suffice for now, but we shall take
them up again later and devise tests to determine whether series are convergent
or divergent.
3-2 Limits of sequences
The term limit was first introduced intuitively in the previous section in con-
nection with a sequence {u„} which had only one limit point. As n increases so
the points representing the terms u„ cluster ever closer to the limit point
whose value L, say, is the limit of the sequence. This idea of a limit is correct
in spirit but it is not very satisfactory from the mathematical manipulative
point of view since the phrase 'cluster ever closer to' is far too vague. The
difficulty of making the expression 'limit' precise is connected with the exact
meaning we give to this phrase.
Our difficulty can be resolved if we recall that any neighbourhood of a
limit point will contain an infinite number of points of the sequence and,
if there is only one limit point, will exclude only a finite number of points!
Thinking in terms of numbers rather than points, a neighbourhood of a limit
point is simply an open interval of the line on which the numbers u n are
plotted and we already have a notation for representing such an interval.
Suppose, for convenience, that the neighbourhood is symmetrical about the
number L and of width 2e, where e is some arbitrarily small positive number.
Then a variable u will be inside this neighbourhood if L — e<u<L + s.
Recalling the definition of 'absolute value', this inequality can be rewritten
concisely as \u - L\ < s. Different values of e > determine different
neighbourhoods, and if u is identified with the term u n of the sequence, then
L is the limit of the sequence if, no matter how small e may become, only a
finite number of terms u„ lie outside the neighbourhood and an infinite
number lie within it.
We can now give a proper definition of a limit.
definition 3-3 The sequence {u n } will be said to tend to the limit L if,
and only if, for any arbitrarily small positive number e, there exists an integer
N such that
n > N ^ \u n — L\ < e.
Let us test our definition on the sequence {u n } with u„ = 1 + (— 1)»/«.
We already know that this sequence has only one limit point at the value
unity, and consequently our definition should show that the limit is unity.
Suppose, for the sake of argument, that we check to see that the definition is
80 / SEQUENCES, LIMITS, AND CONTINUITY
CH 3
satisfied if s = 1/100. To do this we must find a number N such that when
n > N we have
('+*?)-
<
1
loo
This result is obviously equivalent to the requirement that (l/n) < 1/100
which will be true for any value of n greater than 100. Hence if we take
N = \00 the conditions of the definition are satisfied. There are thus 100
terms outside the neighbourhood and an infinite number within it.
Had we demanded a much smaller value of e, say e = 10" 6 , the identical
argument would have shown that the definition is satisfied if N = 10 6 .
There would now be a very large number of terms outside the neighbourhood
0-999999 < u n < 1-000001, in fact 10 6 in all, but this is still a finite number
whereas the number of terms within the neighbourhood is still infinite.
Clearly, however small the value of e, the conditions of the definition will still
apply showing that it is in accord with our earlier intuitive ideas.
In general, when the sequence {u n } has a limit L, so that we say it converges
to L, we shall write
lim u n = L.
Whenever using this notation for a limit the reader must always keep in
mind the underlying formal definition just given.
The definition and the illustrative example just given show that when a
sequence has only one limit point, then it must converge to the value associ-
ated with that limit point. Any sequence such as {u n } with u n = sin {n(n 2 + l)/2n}
cannot have a limit, for it has three limit points at — 1 , 0, and 1 and any small
neighbourhood taken about any one must, of necessity, exclude the infinitely
many terms associated with the other two. Such a sequence does not converge.
Frequently the limit of a sequence is of more importance than its individual
terms, and in such circumstances the notation lim u n is advantageous in that
it focusses attention on the general term u„ of the sequence. The result of the
limiting operation is often readily deduced from the general term as these
examples indicate.
Example 3-3 Determine the limits in each of the following:
r(2« - 1)(« + 4)(n - 2)"
(a) lim
n— *oo
(b) lim
n— *-oo
(c) lim
•1 2
1 + ~2 + *
"5»+i + 7«+r
5» _ 7»
+
n - 1"
SEC 3-2 LIMITS OF SEQUENCES / 81
,^ r n + 22 + 32 + • • • + w2 1
(d ) hm .
n— ><x> L '*
So/Mr/on (a) The general term is «„ = [{In - 1)(« + 4)(n - 2)]/« 3 , so that
expanding the numerator and dividing by n z gives
„ , 3 18 8
u n = 2 + + —
n n 2 n 3
Obviously, as n increases, the last three terms comprising w„ approach zero,
and in the limit we have
lim \(2n-l)(n + 4)(n-2)l = ^
Solution (b) The general term is u n = [1 + 2 + ••• + („ _ l)]/ n 2 , in
which the numerator is the sum of an arithmetic progression. Now it is
readily verified that 1 + 2 + • • • + ( n - 1) = n (n - l)/2 so that
m»
-m
Using the same argument as in (a) above we see at once that as n increases
so u n approaches the value \, whence
hm — H \- ■ • • A = —
Solution (c) The general term here is u„ = (5 B+1 + 7 B+1 )/(5» — 7») and by
dividing numerator and denominator by 7" it may be written:
5(5/7)» + 7
tin —
(5/7)» - 1
Now 5/7 < 1 so that (5/7)» will tend to zero as n increases. Thus u n will
approach the value —7. In this case we may write
lim P" +1 + 7 " +1 l = -7.
n ^l 5»-7» J
Solution (d) The general term is u n = [l a + 2 2 + • • • + «2]/«2 5 j n w hj c h
the numerator is the sum of the squares of the first n natural numbers. Using
the familiar result
12 + 22 + ... + „ 2 = "("+0(2" + l)
6
enables us to write
82 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
(« + 1)(2« + 1)
U n
6w
It is obvious that the numerator is quadratic in n whereas the denominator
is first degree or linear in n. Hence as n increases without bound, so will u„.
This sequence diverges and we write
lim
12 + 2 2 + • • . + „2
00.
Notice that we do not use the equality sign in connection with the symbol
oo, in accordance with the idea that infinity is not an actual number but
essentially a limiting process.
Before continuing our discussion of limits, let us introduce a useful
notation. In the examples above it is apparent that the value of the limit of a
sequence involving the ratio of two expressions as n increases, is entirely
determined by the ratio of the most significant terms in the numerator and
denominator. In the case of a polynomial involving «, the most significant
term as n increases is obviously the highest degree term in which it appears.
Thus in (a), an inspection of the brackets in the numerator shows the most
significant term to be 2n 3 , and as the denominator only involves n 3 , it is at
once obvious that for large n the ratio will approach (2n 3 jn 3 ) = 2.
To streamline limiting arguments of this type, and yet to preserve some-
thing of the effect of the less significant terms, we now introduce the so-called
'big oh' notation appropriate to functions.
definition 3-4 We say that function f{x) is of the order o/the function
g{x), written /(x) = 0(g(x)) if, for some set of values of x
(a) g(x) >
and
(b) |/(x)| < Mg(x),
where M is some constant.
The value of the constant M is usually unimportant as for most arguments
it suffices that such an M should exist. We have these obvious results:
2x 3 + 2x + 1 = 0(x 3 ),
3x + sin x = 0{x),
sin x = 0(1),
where the symbol 0(1) has been used to denote a constant.
In terms of this notation we may write the general term u„ in Example 3.3 (a)
in the simplified form
SEC 3-2 LIMITS OF SEQUENCES / 83
2«3 + 0(„2) 0( - n 2)
w» = — whence u„ = 2 H — ■ f A")
By virtue of the definition of the symbol 'big oh', 0(n 2 ) implies an expression
that is bounded above by Mn 2 , so that 0(n 2 )/n 3 ^> (Mn 2 )/n 3 . However,
M/n -*■ as n increases without bound, so that
lim u n = 2. (B)
Normally the argument just outlined would be omitted, so that result (B)
would be written down immediately after (A).
Implicit in the examples just examined are results which we now combine.
theorem 3-1 If it can be shown that m, m, us, . . . and vi, v%, vs, . . .
are two sequences such that lim u n = L and lim v n = M, then
n— »- co n-* co
(a) mi + vi, uz + V2, us + v 3, . . . is a sequence such that
lim (u n + v n ) = L + M;
n— »co
(b) mvi, U2V2, U3V3, ... is a sequence such that lim u n v n = LM;
M-*co
(c) provided M ^ 0, ui/vi, U2/v 2 , mjvs, ... is a sequence such that
lim (u n jv n ) = LIM.
n->co
These assertions are virtually self-evident and so we prove only the first
result, making full use of our definition of a limit and of the triangle inequality
of Theorem 1-4.
Suppose e is given. Then because {«„} converges to the limit L, there
exists a number Ni such that n > Ni => \u n — L\ < \e. By the same argu-
ment there exists another number N2 such that n > jV 2 => \v n — M\ < fe.
NOW \{u n + V n ) - (L + M)\ = \{u n -L) + (v n - M)\ < \u n - L\ + \v n - M\,
and so n > max (Ni, N 2 ) => \{u n + v n ) - (L + M)\ <\e + \e. Thus, taking
N = max (Ni, Nz), and given an arbitrarily small positive number e, we have
n> N=> \(u„ + v n ) - (L + M)\ < e
or
lim (u„ + v n ) = L + M.
n— *-co
In effect, this theorem justifies any argument in which it is asserted that,
if a is close to A and b is close to B, then a + b is close to A + B, ab is close
to AB, and, provided b and B ^ 0, a/A is close to A/B.
theorem 3-2 Let {«„} and {v„} be two sequences which both converge to
the same limit L, and suppose {w n } to be a third sequence. Then if for all n
greater than some fixed value N, it is true that u„ <: w n < v n , the sequence
{w B } converges. Furthermore, the limit of the sequence {w n } is also L.
84 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
The proof of this theorem is not difficult and so is left to the reader as an
exercise. In essence it involves two stages. The first is to establish that
{u n — w„} and {w n — v n } are both null sequences in the sense that they con-
verge to the limit zero. The second involves the use of Theorem 3-1 (a) to
establish that these two null sequences imply lim w n = L.
In applications use of this theorem is often confined to proving that a
given sequence {w n } converges, so that the sequences {u n } and {v„} then need
to be devised to satisfy the conditions of the theorem.
Example 3-4 Given that
, 11 11
>%=!+;- + -- + •••+ — — +
2 2 2 2"- 1 3.2"
use Theorem 3-2 to prove that the sequence {w n } converges and to find the
limit.
Now, obviously
, 11 1 ,11 11
1 +2 + 2 3 + --' + 2^ <Wre<1 + 2 + 2-i + ' , - + 2^ + 2-»'
and so using the expression for the sum of a geometric progression we may
write
2[1 - (*)»] < w B < 2[1 -(i)» +1 ].
Thus for the sequence {u n } we take w n = 2[1 — (J)"] and for the sequence
{v n } we take v n = 2[1 - (l) n + 1 ]. The conditions of the theorem are then
satisfied, since lim u n = lim i\ = 2. Hence the sequence {w n } converges
and has for its limit the value 2.
At this stage in our discussion of sequences the following result should be
self evident and we state it in the form of a postulate, rather than prove it.
postulate Every increasing sequence which is bounded above tends to
a limit.
The proof of this postulate is outlined in Problem 3.20 at the end of the
chapter. The details are left to the reader, together with the task of showing
the consequence that every decreasing sequence which is bounded below must
also tend to a limit.
It is this postulate that validates the usual arithmetic procedure for finding
a square root. In the procedure an additional digit is added to the approxima-
tion at each stage, thereby giving rise to an increasing sequence that is
bounded above. With a number such as \/2 which we know to be irrational,
this same postulate also justifies its successive approximation by the increasing
SEC 3-2
LIMITS OF SEQUENCES / 85
sequence {u n } of rational numbers 1, 1-4, 1-41, 1-414, 1-4142, ...,««,.. ..
In this case an irrational number \/2 is determined as the limit of a sequence
of rationals. The implications are important, since although irrational
numbers are of frequent occurrence, in our world in which we live we can
only undertake practical calculations using rationals !
Not all sequences are defined explicitly by giving an expression for the
general term u n - Often a sequence is defined recursively by giving a formula
relating the term u n to its predecessor u n -i, and then specifying the value of
Mi. This is, of course, a difference equation, but in this context it is customary
to call any rule of this kind a recurrence relation, and one of considerable
computational importance is
Un
=\[
Un-1 +
(Un-l)™' 1
where m is an integer greater than unity.
The particular significance of this recurrence relation stems from the fact
that by using Theorem 3-2 it is not difficult to prove the rather surprising
result that {«„} always converges to the limit m y'a, irrespective of the choice
of mi provided only that it is positive. The value of the limit is obvious once
convergence has been established, for denoting it by L and setting x n -i = x n
= L, it follows directly from the recurrence relation that L m = a.
Table 3-1 shows the effectiveness of this method as a computational
procedure or algorithm for computing \/2 to five figures, using three different
starting values for mi. To use the relation to compute \/2 we must first set
m = 2 and a = 2 when it becomes
Un = r W«-l +
Un-1.
Taking as representative the three starting values mi = 1, 1-4, and 5, we
obtain Table 3-1 in which a dash signifies that no further change occurs in the
last digit.
Table 31
Un
«i = 1
ui = 1-4
«i = 5
1
1
1-4
5
2
1-5
1-41429
2-7
3
1-41667
1-41421
1-72037
4
1-41422
—
1-44146
5
1-41421
—
1-41447
6
—
—
1-41421
86 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
Obviously convergence is most rapid when the value assumed for m is a
good approximation to the answer, and much effort may be spared by taking
a sensible starting approximation.
3-3 The number e
Later we shall use an important mathematical constant that is always denoted
by the symbol e. This number is both irrational and transcendental, and for
reference purposes its value to ten decimal places is
e = 2-7182818284.
There are numerous different ways of defining this constant, but although
these are interesting, our real concern later in this book will be with the
mathematical use of the constant e. We shall, for example, see how it is of
fundamental importance in the study of differential equations and in the
definition of important mathematical functions like the natural logarithm
and the hyperbolic functions sinh x, cosh x, and tanh x.
However, the real purpose of this section will not be to study these
applications, but to examine one interesting definition of e as the limit of a
particular sequence. This problem provides both a first encounter with e,
and also a useful illustration of how approximate information may be ex-
tracted from the properties of a difficult sequence. We shall prove that if
lim
w
(3-1)
then 2 < e < 3. The problem of determining e correctly to any given number
of figures will be deferred until we are better equipped for the task.
Consider the sequence {u n } with the general term
u n
-(■♦i)-
We will first establish that u n is a strictly increasing sequence, so that
«n+i > u n , and then show that the sequence {u n } is bounded above by the
number 3. The postulate of the previous section then establishes that the
limit e exists and is such that e < 3. Finally, the lower bound 2 will be added
as a trivial consequence of the proof used to establish the upper bound.
First let us expand u n by the binomial theorem :
( i+ i)"- i+B G) +
=V>G)"---
n(n — 1) . . . [n — (n -
-1)] /1\"
+
Now rewrite this :
SEC 3-3 THE NUMBER e / 87
An exactly similar argument applied to u n +i then gives
~ — '^('-^)^(-.-TlX'-.-Tl) + -
*.l(-iTT)(-.4l)-(-:-Tl)
Now all the terms in «» and w„+i are positive and «„+i has one more term than
u n . In addition, terms in u n+1 that are associated with factorials are larger
than the corresponding terms in u n because of the obvious inequalities
K-Tl) >(-fr
Hence w„+i > u n , showing that {u n } is a strictly increasing sequence.
To show that {«„} is bounded above we must try to sum the finite series
for u n and then examine the behaviour of the sum as n increases. As the
finite series (3-2) stands we can make no progress, but an overestimate of
this sum can easily be obtained if the terms of the series are simplified. This
approach will suffice for our purposes, since to prove that the limit e exists,
we only need to prove that {«„} is strictly increasing and bounded above; a
strict upper bound is not necessary here. It is only needed when the exact
value of the limit is to be determined.
If we use the obvious inequalities
■>K)>('-;)H)>'-
it follows at once from Eqn (3-2) that
2 1
2! + 3! + ' " " + «!
« n <l + l+l + I + ... + l ( 3 . 3 )
88 / SEQUENCES, LIMITS, AND CONTINUITY
CH 3
This is still too difficult to sum explicitly, so using the observation :
A 1-1 1-
3! < 22'4! < 2~3'
' ; «! < 2»-i'
we further simplify Eqn (3-3) to the form
«.<l + l+i+I + i +
+
2»-i
(3-4)
This can now be summed, since after the first term the remaining terms
form a geometric progression. We arrive at the result
u n < 1 +
1 - (i)*
1-|
whence lim u n < 3.
n— *oo
The conditions of our postulate are satisfied, so we may conclude that
{u„} has a finite limit e and, furthermore, that e < 3. Examination of Eqn
(3-2) shows that u n > 2 for all n so that finally we have established our claim
that
2 < e < 3.
The form of argument used to overestimate series (3-2) is often useful and
the final inequality (3-4) is usually called a majorizing series.
Closely related to limit (3-1) is the sequence {v n (x)} with general term
v n {-
»-KT
(3-5)
To establish the relationship that exists between e and the limit of {v n (x)}
let us first denote the limit by E(x), so that
E(x) = lim
(' + ;)"]
(3-6)
Suppose x > to be any rational number and define an increasing sequence
{«*} of natural numbers by the requirement that the numbers n^x are integral.
Henceforth we shall set Nt = nijx. Then by restricting n to be a member of
{m} we may define a sub-sequence {vn k i.x)} of {v n (x)} for which Eqn (3-5) may
be written in the form
/ i yv** r/
•♦s)'j
(3-7)
Using the definition of u n we see that
vn k (x) = {u N] ) x ,
SEC 3-4
LIMITS OF FUNCTIONS / 89
so that taking the limit as rik -*■ <x> we have
E(x) = lim v„ k (x)
nic—x,
= [ lim u Nlc ] x = e*
N k —x
Whence the important result
E(x) = e*.
(3-8)
With a more subtle argument it can be established that Eqn (3-8) is
generally true without the restriction of n to the sequence {«*}. This implies
that the result is true for all real x.
Fig. 3-3 Graph of the functions e x and e~ x .
The function e x is one of the most important functions in mathematics
and it is called the exponential function. Fig. 3-3 shows its behaviour with x.
Notice that it is an essentially positive function which is strictly monotonic
increasing with x. Also shown on the figure is the associated function e _x .
3-4 Limits of functions— continuity
The notion of the limit of a function f{x) as x tends towards some value a
90 / SEQUENCES, LIMITS, AND CONTINUITY
CH 3
Fig. 3-4 Function /(x) with unbroken graph.
is intuitively obvious in the case of functions whose graph is an unbroken
curve. A typical function of this kind is illustrated in Fig. 3-4 from which it
is easily seen that if x is considered to be a moving point, then f(x) will
approach the value f(a) as x approaches a from either the left or the right.
In this case/(-x) actually attains the value /(a), and we shall speak of f(a) as
the 'limit of/(x) as x tends to a' and write
lim/(x) = f(a).
Thus, if f(x) = x 3 — 2x 2 + x + 3, then clearly in this case lim/(;c)
= 5 =/(2). A slightly less obvious example involves finding lim/(x) when
/(*) =
Vjc- 1
X- 1 '
since the formal substitution of x = 1 in f(x) seems to yield 0/0 which is
meaningless as it stands. The difficulty here is easily resolved by cancelling a
factor (V* — 1) in the numerator and denominator to give
/(*) =
1
Vx+ 1
from which it is apparent that lim/(jt) = |.
3— 1
In effect, the intuitive notion involved in the limit of a function is essen-
tially the same as that for the limit of a sequence. Namely, we say that L is the
limit of/0) as x tends to a if, for all x sufficiently close to a,f(x) is close to L.
In fact, the determination of the value of the limit L involves the behaviour of
f(x) near to x = a, but does not consider the actual value of f(x) at x = a.
SEC 3-4
LIMITS OF FUNCTIONS / 91
Domain < \x - a\ < 3 Domain < |x - b\ < 8'
Fig. 3-5 Function /(*) has a smooth graph and attains the limit L at x = a.
Whether or not/(a) is actually equal to L, as was the case above, is immaterial.
By only slightly modifying our definition of the limit of a sequence, we arrive
at the following definition of the limit of a function, which is illustrated in
Fig. 3-5, and will be used for our subsequent discussion of continuity.
definition 3-5 The function/(x) will be said to tend to the limit L as x
tends to a if, and only if, for any arbitrarily small positive number e, there
exists a small positive number 6 such that
< |jc — a\ < d => |/0) — L\ < e.
The significance of the condition < |x — a| < <5 is that the value
f(a) is specifically excluded from consideration as being irrelevant to the
determination of the limit. Thus, if
/(*) = (J
+ x 2
for x ^ 1,
for x = 1,
then lim/(x) = 2, despite the fact that/(l) = 5.
Z--1
If the graph of a function /(^ is not unbroken then more care must be
exercised when discussing the notion of a limit. The reason can be seen after
examination of Fig. 3-6 in which the graph has a break at x = c, at which
point the functional value /(c) has been allocated arbitrarily. This graph
defines a perfectly satisfactory function, but as x approaches c from either the
left or the right, so f(x) approaches either the value L- or L+ which are
92 / SEQUENCES, LIMITS, AND CONTINUITY
CH 3
Fig. 3-6 Function f(x) has broken graph.
obviously limits in some sense. Furthermore L- ^ L+ and neither is equal to
f(c). To take account of this, we introduce the concepts of a limit from the
left and a limit from the right.
To simplify the explanation we shall write x -* a— in place of 'x tends to
a from the left' and x -> a+ in place of 'x tends to a from the right'. In terms
of this notation the function/(x) in Fig. 3-6 has the property that lim = L-
and lim = L+ which is indicated in the diagram by means of arrows. Once
x-*c +
again, in arriving at the limits from the left and right of a point, the functional
value itself at that point is not involved. It may or may not equal one of
the two limits so denned. These ideas may be expressed formally as a definition.
definition 3-6 The function /(x) will be said to have the left-hand limit,
or limit from the left, L_ as x ->■ a— if, and only if, for any arbitrarily small
positive number e, there exists a small positive number 6 such that
< a — x <<5=> |/(;t) - L-\ < s.
A corresponding definition exists for the right-hand limit, or limit from
the right, asx-> a+ in which L- is replaced by L+.
Notice that the function f(x) in Fig. 3-6 only has one-sided limits at
x = a and x = d and, even though /(x) has a cusp at x = b, and so is not
smooth there, it nevertheless still has a limit in the ordinary sense at that
point. This is because of the following obvious result.
SEC 3-4 LIMITS OF FUNCTIONS / 93
theorem 3-3 If f(x) has identical left- and right-hand limits at a point
x = a so that L- = L+ = L, say, then lim/(x) exists and is also equal to L.
x~+a
We shall usually resolve simple limit problems of the type just discussed
either intuitively or, perhaps, by appeal to a graph. However, for complete-
ness, we now apply the formal definition of a left-hand limit to a specific
function to show, in principle, how it may be used as an analytical tool in
less obvious situations.
For our example we apply the formal definition of a left-hand limit at the
point x = 1 to the function
/(*) - {;
for x < 1,
forx>l.
Clearly the left-hand limit at x = 1 is determined only by the behaviour
of/(x) to the left of that point. The behaviour of/(x) for x > 1 is irrelevant
to the determination of lim/(x). Obviously, as x —>■ 1 — so x 2 ->■ 1, and thus,
X— 1-
intuitively, lim/(x) = 1.
x-*\-
If our intuitive argument is correct and this limit is in agreement with our
definition, we must show that for any e > we can find a positive S, which
will probably depend on e, such that \x 2 — 1| < e when x— ► 1— and
< 1 — x < d or, equivalently, 1 — 6 < x < 1 .
We have \f(x) - L-\ = |x 2 - 1| = |(x - l)(x + 1)| = \x - 1| . \x + 1|,
but since \x — 1| < d this becomes
|x 2 - 1| <d\x+ 1|. (A)
Since x < 1, we overestimate x in (A) if we replace it by the value unity so
that we have
|x 2 -l|<2<5. (B)
Finally, to make this expression less than any small positive number e,
we need only make 28 < e. This finally proves that lim f(x) = 1.
Some numbers might help here. Suppose, for example, we wish to find the
condition that/(x) should be within 0-001 of the left-hand limit at x = 1.
This amounts to asking that |x 2 — 1 1 < 0-001, which is equivalent to setting
e = 0-001. Hence, as 6 < \e = 0-0005, our x-inequality 1 — 6 < x < 1 tells
us that the required condition on/(x) will be satisfied provided 0-9995 < x < 1 .
In higher mathematics this analytical approach is indispensable but, as
already remarked, for our purposes a graphical approach to the limit of a
function must suffice in most cases. An exception is the discussion of indeter-
minate forms which involve finding the limit of a quotient as x approaches
some value at which both, numerator and denominator vanish. This will
be taken up again later as an application of calculus though the reader should
notice that we have already resolved one such simple problem involving a
94 / SEQUENCES, LIMITS, AND CONTINUITY
CH 3
limit of the form 0/0.
Although a function such as
/!<*> = £
2+ 1
3
for x an integer
for all other x
is a perfectly satisfactory function from the mathematical point of view, it is
not likely to occur in connection with physical problems. We make this
assertion because in the physical world functional relationships are usually
smoothly changing in the sense that a small change in the independent
variable usually produces only a small change in the dependent variable.
This is not always the case however and, for example, in gas flows involving a
gas shock wave the gas pressure experiences a sudden jump across a geo-
metrical surface in space called the shock front. Hence a graph of the gas
pressure p across a plane shock at x = a, as a function of the distance x
measured normal to the shock front, could appear as in Fig. 3-7.
p , —p 2 is pressure
jump across shock
front
Shock front
Fig. 3-7 Gas pressure p as a function of distance normal to shock front at x = a.
Nevertheless, despite the existence of common physical situations of this
type a function as erratic as/i(*) is not likely to be encountered in the real
world. Aside from points at which a jump occurs, the 'reasonable' functions
that occur in physics and engineering must be expected to have the smooth-
ness-of-change property we described earlier.
This smoothness-of-change property is given the mathematical name
continuity and plays an important part throughout all mathematical analysis.
If the reader pauses to think for a moment he will see that the following
definition describes continuity in terms of the left- and right-hand limits.
definition 3-7 The function f(x) is said to be continuous at x = xo if:
(a) lim fix) = lim /(*) = L
x-+<co — X-+XQ +
and
(b)f(x ) = L.
SEC 3-4 LIMITS OF FUNCTIONS / 95
In this definition, (a) demands the equality of the left- and right-hand
limits and (b) ensures that there is no 'gap' in the graph of f(x) at x = xo-
That is to say that the point (xo,f(xo)) lies on an unbroken curve and so
coincides with the limits (a). An alternative, but equivalent, definition of
continuity that is often used replaces (a) by the requirement that lim/(x)
= L but still retains (b). Either form of definition is equally good but we
have chosen to emphasize the ideas of left- and right-hand limits since they
find important applications in engineering and physics.
Continuity essentially describes a property of a function in the neigh-
bourhood of a point of interest and not just at the point itself. Accordingly,
a function will be said to be continuous in the interval (a, b) if it is continuous
at all points x within (a, b).
Notice that the effect of condition (b) of our definition on a function such
as
-,./*» + 1 for * =£1
^ ) = ( 6 for*=l
is to show that/(x) is continuous everywhere except at x = 1.
Let us paraphrase the notion of continuity. In effect, by requiring that a
function f(x) be continuous at x = a, we are insisting that if the variation
of the function about the value L =f(a) does not exceed ±e, where e >
is arbitrary, then we can find an x-interval of width 26 centred on x = a
within which this property is always true. This is illustrated by Fig. 3-5, which
also indicates that in general the number 6 depends on both e and the value
of x at which fix) is continuous. Thus for the same value of e, the interval
about x = a is of width 26, whereas the interval about x — b is of width
23', with 6' # 6.
If the function f(x) is continuous in a closed interval [xi, X2] and s is
given, consider the point x = b at which the function changes most rapidly,
and find the appropriate interval of width 26' centred on x = b in which the
functional variation from f(b) does not exceed ±e. Because the functional
variation at x = b was the greatest of any point in [jci, X2], it is obvious that
if this same interval of length 26' is associated with any other point x' in
[xi, X2], then the functional variation within that interval will certainly differ
by less than ±e from the value f(x'). Hence we can assert that for a function
f(x) which is continuous in a closed interval, when given an e it is possible
to find a number 6 for the definition of continuity which depends only on e
and in no way on the value of x at which continuity is being discussed.
Because of this continuity property which applies uniformly to points
throughout the closed interval [xi, xz] we speak of such functions as being
uniformly continuous. This concept proves to be of extreme importance when
these ideas are pursued further.
The requirement of continuity in a closed interval cannot be relaxed, for
then the result is no longer true. For example, the function/(x) = l/x defined
96 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
in the semi-open interval (0, 2] is continuous, but not uniformly continuous.
This is because for any given e, the closer we take our point x' to the origin,
the smaller must we take the value of 6 in order to satisfy \f(x) —/(*') I < e
for \x — x'\ < d. There is obviously no smallest value of d that will apply to
the entire interval.
There are a number of immediate consequences of the definition of a
limit of a function and of the definition of continuity which we now state as
two important theorems,
theorem 3-4 (limits) Suppose that lim/(x) = L and limg(x) = M, then
X^-XQ X-i-Xq
(a) lim [f(x) + g(x)] =L + M;
x->-xo
(b) lim f(x)g(x)=LM;
x-+x
(c) provided M ^ 0, lim [f(x)/g(x)] = LjM.
x-*xa
The proof of these results is similar in all respects to the proof of Theorem
3-1 and since a representative example was presented there we shall not
repeat the argument again.
theorem 3-5 (continuity) If/(x) and g(x) are continuous at x = xo, then
so also are the functions
(a) /(*)+£(*);
(b)/(*)sto;
(c) f(x)lg(x), provided g(x ) ^ 0.
If, furthermore, f(x) is continuous at x = xq and g(u) is continuous at
u —f(xo), then the continuous function of a continuous function g[f(x)]
is continuous at x = xo.
Once again the proof of this theorem is similar in all respects to the proof
of Theorem 3-1. However for the curious reader we shall prove result 3-5 (a),
using the alternative definition of continuity that we mentioned.
To prove f(x) + g(x) is continuous at x = xq we must establish that
lim(/(x) + g(x)) = L exists and that/(x ) + g(x Q ) = L. Now as/(x) and
X— *-XQ
g(x) are continuous at x = xo by supposition, then lim/(x) = f(x ) and
lim g(x) = g(xo) and so for any positive e there must exist positive numbers
di and <?2 such that \x — xo\ < di =>• \f(x) — f(xo)\ < |e and \x — xo\
< d 2 => \g(x) - g(xo)| < is. Now, \(f(x) + g(x)) - (/(xo) + g(x ))\ =
l(/(*) - /(*o)) + (g(x) - g(x ))\ <: \f{x) - /(xo)| + \g(x) - g(x )| and
\x - xo| < smaller of (d u (5 2 ) => \f(x) -f(x )\ + \g(x) - g(x )\ < ie + ie.
Thus, given any positive e, we have established that by taking d less than either
SEC 3 ' 4 LIMITS OF FUNCTIONS / 97
di or d 2 we ensure that |(/(.y) + g(x)) - (f(.x ) + #(x ))| < e. This formally
proves our assertion. The proofs of results (b) and (c) are similar.
Arguments involving continuity usually rely for their success on the
knowledge that certain familiar functions are continuous. Once a small list
of such functions has been established it can then be considerably enlarged
by repeated applications of Theorem 3-5. Accordingly, we present below a
table of functions, in each case stating the intervals in which they are con-
tinuous. No proof will be given for most entries since the results are obvious
from the graphs but for the sake of completeness we shall formally prove the
first three entries.
Example 3-5
(a) Given that C = constant, the function f(x) = C is continuous every-
where.
The proof is trivial, since for any x = x ,f(x ) = C showing that the defini-
tion is always satisfied.
(b) The function f(x) = x is continuous everywhere.
The proof is again trivial, but let us indicate how the alternative definition of
continuity may be used. We must prove that for all x , lim/(x) exists and is
x—xo
equal to/(x ). Now it is obvious from the definition of/(x) that/(x ) = x .
Also, for any x = x and given e > 0, |/(x) -f(x )\ = \x - x | < e
=> \x — x \ < e so that in this case the quantity 6 = e. The function is thus
continuous at x = x and, as x was arbitrary, it finally follows that/(x) = x
is continuous everywhere.
(c) The function f(x) = x n with n a positive integer is continuous every-
where.
We give a proof by induction. Suppose the result is true for some n so that
x n is continuous at x = x for all x . Now x n+1 = x . x n , and we have just
proved that x is continuous at x . Hence, using Theorem 3-4 (b), x n+1 is
continuous. The result is true for n = 1 and so by the principle of induction
it is true for all n. With a little more care this result can be shown to be true
for any real positive n and not just for n a natural number.
The information contained in this table is likely to be useful on many
occasions and so should be memorized. Its application, together with
Theorem 3-5, to questions of continuity is usually immediate. Thus, for
example, the function /(x) = 1/x + sin x is continuous everywhere except at
the point x = 0, and/(x) = (x™ + a lX ™-i + ■ ■ • + a m )/sin x, with m > 0,
is continuous everywhere except at the points x = rrn for which n is an integer.
Finally, in preparation for our use of limits in connection with the tech-
niques of differentiation, we extend the O-notation to include functions of
98 / SEQUENCES, LIMITS, AND CONTINUITY
CH 3
Table 3-2 Short list of continuous functions
Fund ion f{x)
Interval over which f(x) is continuous
C (constant)
( — 00, oo)
X
(—00, oo)
x n (// > 0)
(—00, oo)
*-" (n > 0)
(— oo, oo) excluding point x =
1*1
(—00, oo)
x n + w"" 1 + • • • + a n (n > 0)
(—00, CO)
x n + aix"' 1 + ■ ■ ■ + a n
x m + bix m ~ l + ■ ■ ■ + b m
(-co, co) excluding the zeros of the denominator
sin x
(—00, oo)
COS*
(—00, oo)
tan*
cosec x
cot*
(2« - 1) - < x < (2/i + 1) -, integral n
O - 1) j < x < (2/i + 1) j, integral n
mr < x < (n + 1)tt, integral n
mt < x < in + 1)jt, integral n
smaller order. Henceforth, we shall write
f(x) = o(g(x)) as x -► xo
with the meaning that
«mM_ ft
The symbol o is read 'little oh' and in words the statement asserts that the
function /(x) is of smaller order than g(x) asx-> xo- For example, we may
write (1 + x 2 ) 3 = 1 + 3x 2 + o(x 3 ) as x -* 0, since (1 + x 2 ) 3 - 1 - 3x 2 ,
= 3x 4 + x 6 = o(x 3 ) as x — >- 0.
3-5 Functions of several variables — limits, continuity
The related concepts of a limit and the continuity of a function extend without
difficulty to functions of more than one independent variable, provided only
that the notion of the proximity of two points is suitably extended. The ideas
involved here can best be appreciated if we confine attention to functions
f(x, y) of the two independent variables x and y.
Let us suppose that/(x, y) has for its domain of definition some region D
in the (x, j)-plane and that (xo, yo) is some point interior to D. Then, before
considering f(x,y), we must first make clear what is to be meant by x -*■ xo,
y->yo in D.
SEC 3-5 FUNCTIONS OF SEVERAL VARIABLES / 99
ifr-V+G'-rj'-s^
Fig. 3-8 Paths for which the point (x,y) -* (xo, yo).
An inspection of Fig. 3-8 shows that starting from the points P and Q in
D, both the full curve and the dotted curve describe possible paths by which
x and y may tend to x and y . In general, we shall write x -> x , y -yyo, or,
say that the point (x,y) tends to the point (x ,yo), if />->-0, where
P = VK* — *o) 2 + (y — yo) 2 ] is the distance between the moving point
(x,y) and the fixed point (x ,yo). This simple device then allows us to
interpret a statement about the two variables x and y in terms of a statement
about the single variable p. By confining attention to a circular region of
radius d centred on (xo, yo) we may conveniently define a neighbourhood of
the point (x ,yo). Any rectangle or other simple closed geometrical curve
containing (x , yo) would, of course, serve equally well to define a neighbour-
hood of (xo, yo)- When using such a neighbourhood it may or may not be
necessary to exclude the boundary and the point (x , yo) itself from the defini-
tion of the neighbourhood.
Thus, for example, the square x = 0, y = 0, x = 1, and y = 1 defines a
neighbourhood of the point (J, J). The function
f(x, y) = \l{xy(x - 1)0 - l)(x ~ i)(y - J)}
is defined in this neighbourhood, but not at (J, £), on the boundary or on
x = \,y = i
Definition 3-8 is now proposed, with this interpretation of x-^-xo,
y ->• yo firmly in mind.
definition 3-8 The function /(x, y) will be said to tend to the limit L as
x -»■ xo and y -+ j> , and we shall write
100 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
lim f(x,y) =L,
if, and only if, the limit L is independent of the path followed by the point
(x, y) as x —>■ xo and y ->■ yo.
As before, we do not necessarily require that f(x , yo) = L, as the func-
tional value actually at the limit point (x , yo) is not involved in the limit
process. If it can be established that the result of the limiting operation depends
on the path taken then, demonstrably, the function has no limit. The following
examples make these ideas clear and, on account of their simplicity, are
offered without proof.
Example 3-6
(a) If fix, y) = — "— , then lim = — ;
!/— 1
(c) if f(x, y) = -— — f— then lim — - — -f-
x*+y*+V t-.fr x* + y* + I 8+772
(d) if f(x, y) = — -, then lim f(x,y) does not exist since
yi x — 1) a;_»i
lim fix, y) = 1 if taken along the line y = x, but lim/(x, y) = — 1
x-*l x -*l
»— 1 y-*l
if taken along the line y = 2 — x.
As might be expected, the concept of continuity of a function fix, y) of
two variables then follows as a direct extension of the definition of a limit.
definition 3-9 The function fix, y) will be said to be continuous at the
point (xo, Jo) if:
(a) lim/(x, y) = L exists
I-KtO
!/— V0
and
(b) /(xo, 70) = L.
We shall say that/(x, y) is continuous in a region if it is continuous at all
SEC 3-5
FUNCTIONS OF SEVERAL VARIABLES / 101
points (x, y) belonging to that region. Notice that condition (a) demands that
f(x, y) has a unique limit as x -*■ xo and y -»■ yo, and condition (b) then ensures
that there is no 'hole' in the surface z =f(x,y) at the point (xo,yo). The
continuity of a function f(x, y) is illustrated in Fig. 3-9 where a circular
neighbourhood of the point (xo, yo) is shown in relation to the surface. In
effect, continuity of/(x, y) is simply requiring that a small change in location
of the point (x, y) will cause only a small change in z = f(x, y).
Fig. 3-9 Continuity of f(x, y) at (x , y ) and discontinuity at (a, b).
In Fig. 3-9 the point (a, b) has been deliberately detached from the other-
wise unbroken surface z =f(x,y), so that the function f(x, y) does not
satisfy the definition there and hence is not continuous at that single point. In
general, a function of one or more variables which is not continuous at a
point will be said to have a discontinuity at that point or, alternatively, to be
discontinuous there. Thus the function of one variable shown in Fig. 3-6
has a discontinuity at x = c and the function of two variables shown in Fig.
3-9 is discontinuous at x = a, y = b.
These ideas also extend to functions of several real variables in an obvious
manner once the 'distance' between two points has been defined satisfactorily.
For functions /(x, j, z) of the three independent variables x, y, z a suitable
distance function between points (x u y u zi) and (x ,y , z ) is the linear dis-
tance between them when plotted as points relative to three mutually perpen-
102 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
dicular Cartesian axes. The distance p is then given by the Pythagoras rule
as p = {(xi - xo) 2 + (71 - jo) 2 + (zi - z ) 2 } 1/2 .
The interpretation of distance in the so-called finite dimensional spaces of
«-dimensions generated by functions of n independent variables is of con-
siderable importance in mathematics. Essentially, of any function p(P, Q)
measuring the distance between points P and Q in the space we require that
for any points P, Q, and R:
(a) P (P, Q) > 0,
(b) p(P, Q) = if, and only if, P = Q,
(c) P (P,Q)=p(Q,P),
(d) P (P,R)<p(P,Q)+p(Q,R).
It is easy to check that the two distance functions already defined satisfy the
above conditions, but this will be left as an exercise for the reader.
Again the determination of the regions in which any given function is
continuous will usually be done either on an intuitive or on a graphical basis.
Thus, in Example 3-6 it is easily seen that:
2x
(a) f(x,y) = -— — 2 is continuous everywhere;
xy + 1 .
(b) f(x, y) = 2 2 is continuous everywhere except at x = 0, y = 0;
x ~t" y
, „ -, s sin xy
(c) f(x, y) = —— — is continuous everywhere;
' x 2 + J 2 + 1 J
(d) f(x,y) = — — is continuous everywhere except at (0, 0) and (1, 1)
and along x = 1 and y = 0.
3-6 A useful connecting theorem
By now it will have become apparent that there is a strong connection
between theorems concerning limits of sequences and the corresponding
theorems concerning limits of functions. In fact, with only trivial modification,
most limit theorems that are true for sequences are also true for functions.
Naturally this is no coincidence and the reason is explained by this connecting
theorem.
theorem 3-6 Let f(x) be a function defined for all x in some interval
a < x < b. Further, let {x n } be a sequence defined in the same interval which
converges to a limit a that is not a member of the sequence. Then if, and only
if, lim/(x„) = L for each such sequence {x n }, it follows that lim/(;c) = L.
re— 00 x^-a
The proof of this connecting theorem comprises two distinct parts. First
SEC 3-6 A USEFUL CONNECTING THEOREM / 103
it must be established that if \imf(x) = L, then sequences {x n } exist having
the required property. Second, the converse result must be proved; that if the
required sequences {x n } exist, then lim/O) = L. Together, these two results
x—*a
will ensure that the theorem works in both directions, so that corresponding
function and sequence limit theorems satisfying the necessary conditions may
be freely interchanged without further question.
The first part of the proof is a direct consequence of Definitions 3-3 and
3-5. It follows from Definition 3-5 that when x is confined to some neighbour-
hood N a of a, then f(x) is confined to a neighbourhood Nl of L. From
Definition 3-3, since {x n } has the limit a, there must be some number «o
such that for n > «o it follows that/(x n ) will also be confined to the same
neighbourhood Nl of L.
The second step is a little harder, since it involves an indirect proof by
contradiction. It involves showing that if we assume that limf(x) ^ L,
then a sequence {z n } can be found satisfying all the requirements of the
theorem, for which lim/(z ra ) ^ L. Hence the contradiction showing that
ft-* 00
the conclusion lim/(x) ^ L was false. We leave the details of this to any
interested reader as an exercise.
To close this chapter, we shall use this theorem together with geometrical
arguments to establish the three useful limits:
/sin olQ\
s(— H (3 ' 9)
S(Hr^)- 0; (3 ' 10)
,. /l-cosa0\ a 2 ,„,„
!2(— jH-t <3U)
These limits are all of the indeterminate variety mentioned earlier and,
although this topic will receive special mention in a subsequent chapter, it is
important for the development of our work that they be examined now. We
shall establish that they are all related to the single limit
sfrV 1 -
which we prove first.
Consider Fig. 310 which represents a circular arc of unit radius with its
centre at O, inscribed in the right-angled triangle OAB.
Then it is obvious that
Area of triangle OAC < Area of sector OAC < Area of triangle OAB.
Expressed in terms of the angle 6 measured in radians this becomes
104 / SEQUENCES, LIMITS, AND CONTINUITY
CH 3
i sin 6 < h6 < \ tan 0,
from which we see that
sin 6
cos a <
< 1.
Fig. 3- 10 Area inequalities.
(A)
This result must be true for all acute angles and, in particular, for the
values of the sequence {0»} denned by 6 n = \jn. Thus (A) takes the form
„ sin d n ,
cos d n < — t — < 1
On
(B)
and, since lim d n = where the limit is not a member of the sequence, we
n— *co
may combine Theorems 3-2 and 3-6 to deduce that
aft 1 )--
(3-12)
To establish limit (3-9) it is only necessary to replace 6 in Eqn (3-12) by
7.6, giving rise to
/sin <x0\
lim ( — — =
or, equivalently,
1
,. /sin a.6\
The limits (3-10) and (3-11) then follow by using the identity 1 — cos a0
= 2 sin 2 \<*.Q to form the expressions
1 — cos
<*0 „ . , „ /sin ioc0\
— =2siniac3 I— ^— \,
and
PROBLEMS / 105
1 — COS (
>_a0 _ _/sin£a0\ 2
e 2
Applying result (3-9) to these we finally arrive at the required results
,. /1-COS0\
hm ->■ . a =
e-o \ J
and
9 ^o I e 2 / \2/ 2
The following general result is sometimes useful and, as we shall show
by example, may be combined with Eqns (3-9) to (3-11) to give a number
of interesting results.
Suppose f{x) and g(x) are two functions such that ]imf(x) = a and
x-*a
lim£(x) = /?, where a and /S are both finite. Then, clearly,
limrfz)
hm [f(x)yW = [Iim/fr)]*-« = «?.
x—*-a x—*a
This result, which is true in general, is of course also true when one or
more of the limits involved is of the form Eqns (3-9) to (3-11).
Example 3-7
, , ,. /x 3 + 2x 2 + x + 1\ [1 ~ m ** x ~ DW* - 1) 2
(a) lim I I
*-i \ x* + 2x + 3 )
/un i- Z 1 - cos 3*V sin 2 *^
(b) hm
*-0 \ X 2 J
Solution to (a) Here fix) = (pfi + 2x* + x + l)/(x 2 + 2x + 3), so that
lim f(x) = 5/6 and as g(x) = [1 - cos 20 - l)]/(jf - 1)2, j t follows from
X-+1
Eqn (3-11) that limg(x) = 2. Hence, lim [/ftc)]»<*» = (5/6) 2 = 25/36.
«— i
SWh?w« to (b) In this case/(x) = (1 - cos 3x)/;c 2 and gix) = (sin 2x)/x.
A direct application of Eqns (3-9) and (3-11) then shows that lim/(;c) = 9/2
and lim^O) = 2 and thus lim [f(x)y (x) = (9/2) 2 = 81/4. *~*
*— x->-0
PROBLEMS
Section 31
3-1 Give an example of a numerical sequence and of a non-numerical sequence.
106 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
3-2 Use the terms bounded, unbounded, strictly monotonic increasing, and
strictly monotonic decreasing to classify the sequences {u n } which have the
following general terms:
(a) ii. = (-«)«+!; (b) u n
-(-?'
(c) u„ = sin (1/n); (d) u n = 2 + (-1)";
, s n + 1 . 2n + 3
(e) u n = = — —z> (f) « n = — — 7-
In + 3 « + 1
3-3 Give an example of each of the following types of sequence:
(a) bounded; (b) strictly monotonic decreasing; (c) monotonic decreasing;
(d) strictly monotonic increasing; (e) bounded above; (f) bounded below.
3-4 Use an ordinary graph to plot the first ten terms of the sequence {u n } for which
k» = (-J)»(» + 2)/«.
3*5 Using the device described in connection with Fig. 3-1 (b) to compress the
horizontal axis, plot the first five terms of the sequences {u n } which have the
general terms:
(a) u n = (■
•«" (m--
n J
(b) Un = 1 + 2 ~y
r = l r-
Section 3-2
3-6 Find a neighbourhood {a, b) of the sequence {1 + (— l) n /«} such that
(a) there are 100 terms outside it;
(b) thefe are 10,000 terms outside it.
Deduce that there are infinitely many terms inside any such neighbourhood.
3-7 Find a neighbourhood (a, b) of the sequence {(2n + l)/«} such that
(a) there are 10 terms outside it;
(b) there are 1,000 terms outside it.
3-8 Name the limit points of the sequence {u n } which has the general term u n
= sin [(« + l)/2]w. Identify the sub-sequences that determine these limit
points.
3-9 Name the limit points of the sequence {u n } with the general term u n =
sin [(n 2 + n + 1)/2/i]tt. Identify the sub-sequences that converge to these
limit points.
3-10 Give examples of sequences having (a) no limit point, (b) one limit point,
(c) two limit points.
3-11 Name the limit points of the sequence {u n } which has the general term
1 ~ 32S for " even
Un= {
——7 for n odd.
PROBLEMS / 107
State whether or not the limit points belong to the sequence.
3-12 Determine the following limits:
(a) hm ;
n— oo n 3
n* i im (2» 8 + n - !)(« + 2) .
{h) l™ (3»» + 7«+ll) '
(c) lim ;; — TTTn'
... ,. n + (-2)"
(d) hm ; — — -;
.. ,. /l 2 + 2 2 + 3 2 +- • - + tfl\
(e) ^ ( W J"
3-13 Give an expression for the «th term of the sequence y/2, V@V2)>
V[2V(2 V2)], .... Use your result to deduce the limit of the sequence.
3-14 Determine the limits :
(a) lim (V(« + a) — V"). where a > is any real number;
n— *-oo
„ „ ,. nil sin n — 3 cos 2n)
(b) lim ;
„—«, n 2 + 2/i + 1
(3«+2 -(. 5«+2\
(d) lim »-v/(l + "")(a ^ °)-
n— > oo
3-15 Use the O notation to express the behaviour of the following expressions for
large x:
(a) 2x 2 + x + sin (1/x); (b) 3 + -;
, N 3x 3 + 2x + 1 ,., * 3 sin x + 1
(C) x 2 +l ; (d) x3 + 3 ;
x 2
(C) V(* 3 +X+1)
3-16 Suppose that the sequences {u n }, {v n }, and {w n } are such that u n < w n < v n
for all n greater than some fixed number no, and that {u n } converges to the limit
L and {v n } converges to the limit M. Show by example that the sequence {w n }
need not converge to a limit.
317 Outline the details of the proof of Theorem 3-2. (Hint: Consider the limits of
the sequences {u n — w n } and {w n — v n }-)
3-18 Give two different proofs of the convergence of the sequence {u n } in which
11 11
u n =l+~ 3 + ^ + • • - + 3^1+ —^> appealing first to Theorem 3-1 (a)
and then to Theorem 3-2.
108 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
3-19 Use Theorem 3-2 to prove the convergence of the sequence {u n } in which
1 / 1\ w 2 / 2\ -n 3 . / 3\ w
«„ = - 2 sm^l+-j-+- 2 s 1 n^H--j- + -s.n^l+-j- + ---
n — 1 . / , n — 1\ w
sin 1+ -.
n \ n J 2
3-20 Let {u n } be an increasing sequence bounded above by m. Let this bound m,
together with the members u n of the sequence, be represented by points on a
line. Then, either the mid-point mi = i(ui + m) of the line segment between
hi and m is an upper bound of {««}, or it is not. According as rm is, or is not,
an upper bound of {««}, take for the next point rm the mid-point of the half
line segment to the left or right of mi, respectively. Next, according as mz is
or is not, an upper bound of {u n }, take for the next point ma the mid-point of
the quarter line segment to the left or right of ni2, respectively. Repeat this
process indefinitely to generate an infinite sequence of points {m r } as indicated
in the diagram.
:,l*Si
m
Limit L of {u n }
Give reasons why
(a) {m r } has a single limit point L ;
(b) the fact that {u n } is an increasing sequence implies that lim u n = L.
m-coo
3-21 Let u n = i(u n -i + (o/«n-i)) and v„ = («„ — V«)/(«n + Va), where hi and
a are any positive numbers. By showing that v„ = v n -i 2 = v n -2 4 = v n -3 8
= ■ ■ ■ = vi*"'", deduce the result < v n < | fi |". Then, using
Theorem 3-2, prove that lim v„ = thereby establishing that lim u„ = \/a.
n— *co ?i-*co
3-22 Using the algorithm u„ = Mu n -i -\ compute to four figures the first
five terms in the sequence {««} corresponding to the starting values (a) u\ = 1,
(b) «i = 2. Compare your results with the limiting value V3.
3-23 Using the algorithm u n = Mu n -i H A compute to four figures the first
five terms in the sequence {u n } corresponding to the starting values (a) wi = 1,
(b) «i = 2. Compare your results with the limiting value 3 V5.
Section 3-3
The following two related problems show how the approximate behaviour of e* in
the interval — 2 < x < 2 may be inferred directly from the sequence {v n (x)}.
3-24 Define v n (x) by the expression
v n (x) = (l + A"-
PROBLEMS / 109
Use essentially the same arguments as those leading to Eqn (3-4) to prove
that {v n (x)} is a strictly increasing sequence for any fixed positive x and then
show that
x 2 x 3 x n
Vn{x) <l + x + T + - + --- + — .
By summing this expression and taking the limit as n -*■ co deduce that
2 + x
1 < e* < for < x < 2.
2 — x
Compare this result with Fig. 3-3.
3-25 Using the same definition of v n (x) as above, form the sub-sequences {v2m(x)}
of even terms and {t)2m+i(x)} of odd terms. Modify slightly the arguments used
in the previous example to prove that both sub-sequences are strictly mono-
tonic decreasing for negative x. Show that {V2, n +i(x) — i>2m(x)} is a null
sequence and hence deduce that both the even and odd sequences tend to the
same limit. Modify v* m (x) to establish that
x 2 x 3 x 2m
V 2m (x) >\~ X+ ---+■■ ■+ ^n
By summing this expression and taking the limit as ;/ -»• co deduce that
2 — x
< e* < 1 for < x < 2.
2 + x ~ ~
Compare this result with Fig. 3-3.
Section 3-4
3-26 Determine the following limits of functions:
(a) lim x 3 - x 2 + x + 1 ; (b) lim * t *"!" S
z~a *~*3 x 3 — 1
/ ^ r V(x2 ~ 6) mm- x 3 + x 2 ~x-2
(c) hm — ; (d) lim
.3 x 2 + 1 ' v ^__ 2 (x + \)(x + 2)
(e) lim - + A)3 ~ * -; (f) lim {\/(x 2 + 1000) - VO 2 - 1000)};
(g) lim x[V(.x 2 + 3) - jc].
Determine these limits whei
(x 3 + x - 1 f(
(a) \imf(x) where /(x) = (
«-i 1 1 + sin (.v —
:r-*cc
3 r 27 Determine these limits when they exist :
- 1 for x < 1
1) for x > 1 ;
x - 1
(b) lim
f.
(c) lim/(x) where f(x) = (
r
x 2 + sin i « for x < 3
4 + x 2 for x > 3 ;
(d) lim |x 2 -l |; (e) lim ? + C ° S * ■
x-~i x-~\* 1 - sm x
110 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
3-28 Determine the left- and right-hand limits of these functions at the stated
points:
3^+1 _j. 5^+1
(a) lim — — ;
*-2± 3* + 5*
,w> r /•/■-> u rr ^ f 1 + 2 sin* for .t <\*
(b) hm f(x) where /(a-) =j
x—u± |cosec x for x > \-n\
(c) lim I x 2 + x - 1 I ;
x— 2±
-2 for x <
(d) lim /"(*) where /'(jf) =
,^o± 7W yv U+-|x|forjc>0;
(e) lim -^— •
z->-3± J — •*
3-29 Determine the domains of definition for which these functions are continuous :
(a) /(*) = x + M ; (b) f{x) = l/(x 2 - 1) ;
^ « ^ * 5 + x 2 - 1 x s + 4x 2 + x _ 6
(C) * x) = 4 + sinx-2cosx' (d) ^ W = (x - l)(x + 4) ;
!2x + sin x for x # « 77/2
« 2 + 1 . ,.
3-30 Give examples of functions of the following type:
(a) continuous everywhere except at x = 1 and x = 2;
(b) discontinuous at the points x = rnr with n an integer;
(c) continuous everywhere but neither purely algebraic nor purely trigo-
nometric;
(d) continuous everywhere except at x = 1, where the left-hand limit is —1
and the right-hand limit is 3 ;
(e) continuous everywhere except at x = 1, where the left-hand and right-
hand limits both equal 2.
3-31 Suppose it is known that a function /(x) is continuous over the interval
xo < x < x 2 , and that f(x ) = yo, f(xi) = y x and /(x 2 ) = yi. Explain why
it is reasonable to assume that when the functional values yo, yi, and yz are
reasonably close together, f(x) may in some sense be represented by the
expression
f( x ) ^ - *i)(* - X2) (x - x )(x - x 2 )
~ (xo — xi)(xo — X2) (Xl — X ){X\ — Xl) '
(x — Xo)(x — Xl)
(X2 — Xo)(X2 — Xl)' 2 '
Any formula such as this, from which the behaviour of a function over an
interval is inferred from its behaviour at specific points in that interval, is
called an interpolation formula. This particular one is called the three point
Lagrangian interpolation formula and we shall see later that it gives exact
results when applied to any linear or quadratic function f(x). Considering
y = sin x for < x < 3tt, explain how this formula might give misleading
results.
PROBLEMS / 111
3-32 Apply the expression given in Problem 3-31 to the function y = sin x, taking
as the points xo, x\, and xi the respective radian arguments 0-6, 0-9, and 1-2
and so find the appropriate three point Lagrangian interpolation formula over
the interval 0-6 < x < 1-2. Use your result to deduce approximate values for
sin 0-8 and sin 11 and compare these with the exact tabulated values.
3-33 Repeat the previous problem, but this time take xo = 0-4, xi = 1-2, and
X2 = 1-7 and deduce approximate values for sin 0-9 and sin 1-5. Compare
your results with the exact tabulated values.
3-34 Consider the continuous function f(x) defined on the interval [0, 2] by the
rule f(.x) — x for < x < 1 and f(x) = 2 — x for 1 < x < 2. Taking
xo = 0-2, xi = 0-8, X2 = 1-3, apply the expression given in Problem 3-31 in
order to find an interpolation formula over the interval [0-2, 1-3]. Compare
the approximate and exact values at x = 0-5, 0-7, and 10.
3-35 The density of thematerial of a rod of length L is a function /'(x) of the distance
x measured from one end. Describe in physical terms, rods that are char-
acterized by the following functions f(x) :
(a) /(x) = constant for < x < L ;
I pi for < x < §L
(b) /(*)="
(c) fix) = P (l + kx) < x < L.
3-36 If the function f(x) has the same meaning as above, specify the functional
forms it must take in order that it describes :
(a) a rod of length L having constant density pi over half its length and a
density that changes steadily (that is, linearly) with distance from pi to
P2 over the remaining half of the rod;
(b) a rod of length L comprising three sections of equal length with constant
densities pi, P2, and P3 in each section ;
(c) a rod of length L having a density that increases quadratically with x
(that is, like the square of x) from pi at x = to P2 at x = L.
Section 3-5
3-37 Let/(x, y) denote the density of the material at the point (x, y) of a thin flat
plate in the (x, j)-plane. Give the functional forms of/(x, y) in order that it
should describe:
(a) a circular plate of radius R centred at the origin, with the material to the
left of the j-axis having a density pi and the material to the right a density
Pi',
(b) a circular disc of inner radius R and outer radius 3R in which the density
is constant and equal to p out to a circle of radius 2R, after which it
decreases linearly to the value Jp at the outer edge of the disc;
(c) an isosceles triangle with its apex at the origin and sides of length L
lying to the right of the y-axis and inclined at angles \-* and —\t, respec-
tively, to the x-axis, with the material above the x-axis having a density
Pi and the material below the x-axis having a density P2.
3-38 Let point P have the Cartesian coordinates (1, 1), and let N\ denote the unjt
circle drawn with P as its centre. Define N r to be a circle, concentric with Ni,
and let us agree to write N r +i <= JV r if the circle AV+i is contained within the
circle N r . Then N r +i <= N r , for all r, describes a family of neighbourhoods of
112 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
the point P. Give examples of families {N r } of neighbourhoods of P that :
(a) have the property that lim (radius of Nr) ->- J;
r— ►co
(b) have the property that lim (radius of N r ) ->- 0;
r— *-co
(c) have the property that; area N r +i = J area N r and lim (area of N r ) -> \.
r->co
3-39 State the largest neighbourhood about the stated points P in which the
following functions are defined. Also state if they are defined at P and on the
boundary of the neighbourhood :
(a) f(x,y) = \l{xy(lx - l)(y + 2)(x + l)(y - 2)} taking point P as (- 1, 2);
(b) /(x, v) = _ a _ 2 taking point P as (0, 0);
1 + x 2 + y
3-40 Determine these limits when they exist :
(c) f(x,y) = 1 ^ taking point P as (2, 3).
/ ^ ,. ^ 2 y 2x 2 + xv + 1
(a) ^ 2, 2 + 2v 2 +i ; (b) ," m K , 2 + 2,; + , 2 ;
!/— 2 J/--.2
(c) ,im fr-')™* . (d) lim
:r-2 X i — 4 j^o
W— 1 V— Iff
1+2 cos xv + sin xy
2 + xy
3-41 Give examples of functions f{x,y) having these properties:
(a) Km f{x, y) = 2; (b) Mm. fix, y) = 0;
W—3 )/-> Jtt
(c) lim fix, y) does not exist.
V—-3
3-42 Find the points or lines of discontinuity of these functions:
(0 for x 2 + y 2 = 1
(a) fix, v) = x sin xv
[ t _ X 2 _ y2 elsewhere;
(3 for x — 1 , y = 2
1+2 , 2+/ elsewhere ;
■ y — 1
,,, ,, , x 2 sin v + y 2 sin x + 2
(d)/(*.,y)= 3c4 / 2jcV+J>4+1 -
3-43 Let P and Q be any two points in the (x, j)-plane. Prove that if the distance
function p(P, Q) is taken to be the length of the straight line joining P to Q
then:
(a) P (P, Q) > 0;
(b) p(P, Q) = if, and only if, P. = Q;
PROBLEMS / 113
(c) p(p,Q) = p(Q, P);
(d) p(P, R) < p(P, Q) + p(Q, R), where R is another point distinct from
P and Q.
3-44 Repeat the proof of the previous problem, but this time let P, Q, and R be
points in space.
Section 3-6
3-45 Apply the results of Section 3-6 to determine these limits:
x r , „ ,. 1 — x'2 cos x
(a) lim — -; (b) lim , r _ , A
*-o V(l - cos x)' ,^.„ y'[2 sin (x - i^)]'
sin (x + h) - sin x 2 sin 3 (x/4)
(c) hm ; (d) lim ;
A-m h , r ^o x 3
, . ,. I sin x |
(e) lim
x—o± x
3-46 Apply the results of Section 3-6 to determine these limits:
^ r i 2 _l u , ^ / cos(x + h) - cosx \
(a) lim (x 2 + hx + 1) ;
ii^o \ n ]
„ . ,. sin x — sin a , . ,. / sin «
(b) hm ; (c) lim —
x \ sin xjta'
(1 \ /x 2 •— x + 4\( si " -■ f '/- r
a: sin- ; (e) lim — — - ;
x) .r^o\x 2 - x + 1/
■v_2\ t si " 3(-c - 2)]/(.r - 2)
(f) lim ' X
,_ 2 \x* - 4)
3-47 If h{x) is a function for which lim h(x) = 0, use Theorem 36 to justify writing
x— *-a
lim (1 + //(x)) 1 '"'^ = e.
x-*a *
3-48 Let functions /(x) and^-(x) be such that \\mf(x) = 1 and lim g(x) — ► x, so
that we may write /(x) = 1 + //(x) where \imh(x) = 0. Then, considering
x-*a
the function [/(x)] ff<a:> , use the result of Problem 3-47 to show that
\\mh(x)y{x)
(x) — p.x~*a
lim [f(x)]o
3-49 Use the result of Problem 3-48 to determine these limits:
(a) lim (l--Y; (b) lim (^JV;
x^oz \ Xj x ^„ \X + I J
(c) lim ( — ^-r) ; (d) lim (1 + sin 2x) 1/x .
x^cc \X + 1/ x—
114 / SEQUENCES, LIMITS, AND CONTINUITY CH 3
3-50 Determine the following limits which do not necessarily require the result of
Problem 3-48:
(a) lim (l + -)l X ; (b) Hm(l + ^ ) ;
I i\(4* + 3)/(x + 2)
(c) lim - ; (d) lim (cos x) :
*-«, \X 2 J x^O
(e) lim (cos x) 1 /"; (0 lim
x^O ' r—0 \ 4 *
,2/**.
Complex numbers and
vectors
4-1 I ntroductory ideas
A number of important properties of the real number system have already
been considered, and we shall now examine to what extent quantities repre-
sentable as displacements in space may be incorporated into a number
system. The name vector quantity is reserved for all quantities that are
representable as a displacement in space or, more exactly, as a directed line
element. Familiar vector quantities are force, magnetic field and velocity,
which are all representable by a line whose length is proportional to their
magnitude and whose direction is parallel to the direction of the original
quantity. In addition, the line of action of a vector has a sense associated
with it, which means that we must specify a direction along the line to
indicate the way in which the vector acts.
Thus to represent a velocity of 3 ft/s in an easterly direction we would
first adopt a convenient length scale, say 1 in to represent 1 ft/s and then,
after marking the points of the compass on our paper, we would draw a line
3 in long in an east-west direction. Finally we would add an arrow to the
line pointing eastwards to indicate the sense of the velocity. This line could
be located anywhere on our paper since it does not represent a velocity that
is associated with any particular point. Reversal of the arrow would corres-
pond to a reversal of the direction of the velocity, so that the line would then
represent a velocity of 3 ft/s in a westerly direction.
Not all quantities are vectors, and another important group are called
scalars. The word scalar describes any quantity that has magnitude but no
direction. Typical scalar quantities which have units are temperature, mass
and pressure. The real numbers are themselves scalars, and are used to describe
the numerical magnitudes of both scalar and vector quantities, irrespective
of whether units may be involved. The terms scalar and vector describe
collectively two important groups of quantities in the real world. It should,
however, be added that they do not jointly give a complete description of
all possible physical quantities. Others exist that are neither scalar nor vector,
though this need not be elaborated here.
In giving meaning to the square root operation when applied to negative
numbers, we shall see that a special kind of two-dimensional vector arises.
Its value in mathematics has proved to be so great that although such vectors
are restricted to describing vector quantities in a plane, they have been given
a special name, complex numbers. Because of this restriction, in addition to
116 / COMPLEX NUMBERS AND VECTORS CH 4
studying complex numbers, we shall need a more general theory of vectors so
that we can describe the cited examples of vector quantities, and any others
that may arise, in all possible situations and not just in a plane.
Despite this limitation of complex numbers, their vector properties are
still important enough in special situations for them to be in this chapter.
Their value elsewhere in mathematics however is even greater, and makes
them a discipline in their own right. The main reason for this is to be found
in their relationship to real numbers and in the consequences of their intro-
duction into functional relationships in the roles of independent and dependent
variables. This latter aspect will be pursued later when we discuss another
valuable geometrical idea, a conformal transformation. In the meantime we
shall develop the vector properties and algebra of complex numbers to the
point of general usefulness in mathematics, postponing until the end of this
chapter the alternative approach that is necessary for study of general
three-dimensional vector quantities. As already mentioned, each is valuable as
a separate discipline, though, as would be expected, each has a separate
notation and, generally, a quite different field of application.
The following introduction to complex numbers is based only on a
knowledge of elementary trigonometric identities, and not until after more
study of the exponential and trigonometric functions will we unify our
treatment of these two topics.
The origin of complex numbers was the desire of eighteenth-century
mathematicians always to be able to compute the roots of polynomials,
even when they are of the form
x 2 =-\. (4.1)
It was Leonhard Euler (1707-83) who first recognized that the real number
system was deficient in respect of admitting solutions to all possible poly-
nomials and, in connection with Eqn (4-1), he proposed that a new number i
be introduced to extend the number system. In keeping with the mathematical
beliefs of that period, he called i the unit imaginary number and related it to
real numbers by requiring that
J' 2 =-1. (4-2)
If we allow the use of this new symbol, then / = y / — 1 is the positive
square root of minus one, whence Eqn (4-1) may be seen to have the two
roots x = i and x = —i. That x — i is a root follows from the definition of /',
whilst x = — / is also a root since (— i) 2 = (— l) 2 . i 2 = 1 . i 2 = — 1. With
the introduction of /, equations such as
x 2 = -k,
which are slightly more general than Eqn (41), can also be solved. The
equation may be re-expressed in the form x 2 = k . (— 1), showing that its
roots are x = i\/k and x = —iy/k, where the positive square root is always
SEC 4-1 INTRODUCTORY IDEAS / 117
taken. For example, if x 2 = —9, then the roots are x = 3/ and x — —3/'.
The success of Euler's idea lies in the fact that only this one new number
need be introduced to enable solutions to be found to all polynomials, irre-
spective of their degree. As a first step towards seeing this, consider the
quadratic equation
ax 2 + bx + c = 0, (4-3)
and suppose that b 2 — 4ac < 0. Then, setting 4ac — b 2 = m 2 , and formally
applying the usual formula for the roots of a quadratic, we obtain
-b ± V-w 2
or
Hal) *'(£)•
la
Hence, denoting the two roots by x\ and x%, they take the form
*-(i?) + / (s) and *-(ir) -'(£)■ (4 - 4 >
The numbers xi and xi are not ordinary numbers since each comprises the
sum of a real number and a multiple of the unit imaginary number /. On this
basis it is reasonable to conjecture that each root of any arbitrary polynomial
will be of the same form and, should the multiplier of i be zero, that root
will reduce to a real number.
This conjecture is correct, but before we may verify it, we must see how to
perform arithmetic on numbers of this special type. These are the complex
numbers already mentioned and, henceforth, we shall always refer to them
by this name. Unless the exact form of a complex number is needed, it is
useful to denote it by a single symbol, usually z, so that an arbitrary complex
number z is of the form
z = x + iy, (4-5)
where x and y are real numbers. We call Eqn (4-5) the real-imaginary form
of a complex number, and refer to x as the real part of z, and to y as the
imaginary part of z. In symbolic form we write
x = Re z, y = Im z. (4-6)
Hence if z = 4 — li, then Re z = 4 and Im z = —7. We stress that Re z
and Im z are real numbers. The zero complex number is denoted by and
represents the number z = + i . 0.
Already, and without proper justification, we have attributed some
reasonable arithmetic properties to i. We have, for example, assumed results
such as xi = ix for all real <x, and \/—x = \/—\ . *Jx = iy/x. To proceed
logically and rigorously it would be necessary to define addition, subtraction,
multiplication, and division for complex numbers and then to examine the
applicability of the real number axioms of Chapter 1 in the case of complex
numbers. This is necessary since whatever the arithmetic laws we now propose
118 / COMPLEX NUMBERS AND VECTORS CH 4
for complex numbers, they must obviously be in agreement with the real
number axioms of Chapter 1, whenever the imaginary parts of complex
numbers are zero. We shall not in fact justify the complex number axioms we
now formulate, since this is a straightforward matter and provides good
exercise for the student (see the problems at the end of the chapter). Instead,
we simply summarize the results, pausing only to discuss in detail the most
basic operations necessary for the manipulation of complex numbers.
4-2 Basic algebraic rules for complex numbers
First we shall agree to denote addition and subtraction of the complex
numbers z\ and zi in the usual manner by writing z\ + zz and z\ — z%,
respectively. Multiplication of the complex numbers z\ and zz will be denoted
by juxtaposition thus, Z1Z2. Before going on, and in order to work with
equations, we must define the meaning of equality between two complex
numbers, and then we can define the operations of addition, subtraction, and
multiplication. The following definitions are all phrased in terms of the
arbitrary complex numbers z\ = a + ib and zi = c + id.
definition 4-1 We shall say that the two complex numbers z\ and z% are
equal, and will write z\ = z% if, and only if, a = c and b = d. That is if,
and only if, their real parts and their imaginary parts are separately equal.
Example 4-1 Of the complex numbers z\, z%, and zz defined by z\ = 3 — 1
zi = 1 + 3i, and zz = 3 — 2/, it is obvious that z\ = zz but that z\ # .
= 3 - 2/,
: Z2
and zz =£ zz.
definition 4-2 By the sum z\ + zz will be understood the single complex
number which written in real-imaginary form has a real part that is the sum
of the real parts of z\ and zi, and an imaginary part that is the sum of the
imaginary parts of z\ and zz. Thus for the stated numbers z\ and Z2 we have
zi + z 2 = (a + c) + i(b + d).
Example 4-2 If z\ = 2 + i and Z2 = 1 — 3/', then z\ + z% = 3 — 2/'.
definition 4-3 By the difference z\ — zz will be understood the single
complex number which written in real-imaginary form has a real part that
is the difference of the real parts of zi and z% and an imaginary part that is the
difference between the imaginary parts of z\ and z%. Thus for the stated
numbers z\ and zi we have
z\ — z% = (a — c) + i(b — d).
Example 4-3 If z\ = 5 + 6< and z% = 4 — 2/, then z\ — z 2 = 1 4- 8/.
SEC 4-2 BASIC RULES FOR COMPLEX NUMBERS / 119
Using these definitions it is easily verified that axioms A-l to A-5 of
Chapter 1 also apply to complex numbers. To proceed to an examination of
the other axioms we must define the operation of multiplication.
definition 4-4 The product z\z%, in which z\ = a + ib and z% = c + id,
is a single complex number which may be written in real-imaginary form.
The product is carried out algebraically as would be the ordinary product
(a + P){y + d), and the final result is obtained by making the identifications
a = a, jS = ib, y = c, d = id and using the result i 2 = — 1 to combine the
four terms that result into a real part and an imaginary part. Thus we have
zxzi = (a + ib)(c + id) = ac + iad + ibc + i 2 bd = {ac — bd) + i(ad + be).
Example 4-4 If z\ = 2 + 3/ and Z2 = 1 — /, then z\z% = 5 + /. As a more
difficult example let us express (1 + j) 4 + (1 — i) 4 m real-imaginary form.
Now (1 + j) 4 = (1 + 4/ + 6/ 2 + 4/ 3 + j 4 ) and (1 - /) 4 = (1 - 4/ + 6i 2
— 4j 3 + j 4 ), but as i 2 = —1, i 3 = —i, and f 4 = 1, these expressions become
(1 + /)4 = _4 and (1 - /)« = -4. Hence (1 + i) 4 + (1 - 4 = -8.
The definitions of addition, subtraction, and multiplication of complex
numbers are used in the obvious manner for the solution of simple equations.
Thus, if 2z — (2 + = 4 — 3i, then adding (2 + to both sides of the
equation gives 2z + = (4 — 3r) + (2 + i) or 2z = 6 — 2/ whence z = 3 — i.
In all cases, the reader should memorize the method employed in the
definitions, and not the quoted formulae.
With this definition of multiplication it is a simple matter to verify that
axioms M-l to M-4 and also axiom Dl apply to complex numbers. When
one of the numbers z\ or z% reduces to a real number, then the real and
imaginary parts of the other are both scaled by the same factor. If the scale
factor is — 1 the sign of the complex number is reversed. To discuss axiom
M-5 and division we need to proceed more carefully.
As it stands, an expression such as (a + ib)jc is well defined as a complex
number, for we may regard (1/c) as a multiplier of (a + ib) and, provided
c # 0, Definition 4-4 will give the result. In this case a and b are both scaled
by the factor (1/c). However, it is not clear that the more general expression
z\ a + ib
z 3 = - = — —, (4-7)
z<l c + id
is reducible to a complex number expressible in real-imaginary form. The
key to this problem is to be found in M-5 itself when we recall that division
is really defined as the operation inverse to multiplication. Hence, we must
rewrite Eqn (4-7) in the equivalent form
z 3 (c + id) = a + ib, (4-8)
120 / COMPLEX NUMBERS AND VECTORS CH 4
and then try to determine zz. Now it is easily verified that any complex
number a + //? when multiplied by the associated complex number a — //?
gives the real number a 2 + /? 2 . Hence, if both sides of Eqn (4-8) are multiplied
by (c — id), the multiplier of zz will simply become the real number c 2 + d 2 .
Carrying out this operation, Eqn (4-8) takes the form
z 3 (c 2 + d 2 ) = (a + ib)(c - id) (4-9)
whence, dividing by the real number (c 2 + d 2 ), we find that
(ac + bd) + i(bc - ad)
c 2 + d
z * = :h-5i • < 4 - 10 >
Equation (4-10) is now in the real-imaginary form of a complex number and
is the result of the quotient (4-7). Many books take expression (4-10) as the
formal definition of the quotient (4-7). The definition we shall propose shortly
is equivalent to Eqn (4-10) in all respects, but its form is much easier to
memorize. The simplification is achieved by the introduction of a new and
useful operation called forming the complex conjugate of a complex number.
definition 4-5 If z = a + ib is an arbitrary complex number, then the
complex number z = a — ib is the complex conjugate of z. The symbol z
is read 'z bar'. Equivalently, we may state that the complex conjugate of a
number is always obtained by changing the sign of the imaginary part of
that number.
With this definition in mind it is easy to show that the following definition
of the quotient z\\z% is equivalent to Eqn (4-10).
definition 4-6 (division) The quotient z\\z<i of the two complex numbers
z\ and Z2 is the complex number (ziz 2 )/(z2Z2).
Using this definition it is a straightforward matter to verify axiom M-5 for
complex numbers, provided only that Z2 ^ 0.
Example 4-5 We illustrate division by setting z\ = 2 + / and z 2 = 3 — 2/.
Now z 2 = 3 + 2i and zi/z 2 = (ziz 2 )/(z 2 z 2 ) = (2 + 0(3 + 2/)/(3 - 2/)(3 + 2i),
whence Z1/Z2 = (4 + 7/)/ 13. By this same method, an equation of the form
2z(2 + i) = 1 + i is seen to have the solution z = (1 + i)/(4 + 2i)
= (3 + 0/10.
On account of the fact that z is an ordinary complex number, its general
properties are exactly the same as those of any other complex number. Hence
the number axioms that apply to z, apply equally well to z. The following
specially useful results are easily proved, and are related to the arbitrary
complex number z = x + iy, to its complex conjugate z = x — iy and to
the real number \z\ associated with z and defined to be \z\ = (x 2 + y 2 )*.
(See Definition 4-7.)
SEC 4-2 BASIC RULES FOR COMPLEX NUMBERS / 121
z
fz
= 2 Re z = 2x;
z — z = 2i Im z = 2iy;
* = W,
l —F)-
z \z}'
(z») = (z) n ;
Zl
Z2
l f i|.
|z 2 |'
(Zl + Z2 + •
■ ■ +z n )
= Zl + Z 2 + • •
Z1Z2 • ■ • Z« = Z1Z2 • • • z n .
We now utilize some of these simple properties of the complex conjugate
operation to prove an important theorem concerning the roots of a poly-
nomial, and shall then deduce three very useful corollaries. In the process of
doing so, we shall take as self-evident the fact that a polynomial P(z) of
degree n has n factors of the form (z — £). These are called linear factors
because they are of degree 1. The numbers £ may, or may not, be complex.
T h E o R E m 4- 1 If the «th degree polynomial
P(z) = a z n + fliz"" 1 + - ■ • + a„
has its coefficients a , a\, . . ., a n real, then if z = £ is a zero of P(z), so also
is z = t, a zero of P{z).
Proof Suppose that z = £ is a zero of P(z). Then by definition
flo£" + ai^- 1 + ■ ■ ■ + a n = 0.
Hence, taking the complex conjugate of this equation we may write
(ao£" + fli^" 1 + ■ ■ ■ + C) = 0.
However, the complex conjugate of a sum is the sum of the complex con-
jugates of the individual terms comprising the sum so that
(a £» + fli£»- x + ■ ■ ■ +a n ) = «„£« + ai^" 1 + ■ • ■ + a n .
Now as the a r , r = 0, 1, . . ., n are real, it follows that a r = a r and so
a r r,«-r = a r tn-r = a r (£)n- r> for r = 0, 1, . . ., «.
Hence,
a<& + fli^" 1 + • • • + a n = 0;
122 / COMPLEX NUMBERS AND VECTORS CH 4
showing that P{1) = 0. Thus z = \ is also a zero of P(z).
Paraphrased, Theorem 4-1 asserts that if a polynomial with real coefficients
has complex zeros, then they must occur in complex conjugate pairs.
As any zero which is not complex must be real, it follows that we may
formulate a Corollary to Theorem 4- 1 .
Corollary 4-1 (a) If a polynomial has real coefficients, then those of its
zeros that are not real, occur in complex conjugate pairs.
If z = £ and z = £ represent any pair of complete conjugate zeros in
Theorem 4- 1 , then (z — and (z — £~) must both be factors of P(z). Hence their
product (z — £)(z — £) must also be a factor. Now
(z - o(z - d = z 2 - a + c> + a,
and as t, + \ = 2 Re £ is a real number and ££ = | £| 2 is also a real number,
it follows that the pair of complex conjugate zeros correspond to a single
quadratic factor with real coefficients. Hence Corollary 4-1 (a) may be
re-phrased thus :
Corollary 4-1 (b) Any polynomial with real coefficients may always be
factorized into a set of factors which are linear or at most quadratic, each of
which has real coefficients. Specifically, if the polynomial is of degree n and
there are m pairs of complex conjugate zeros, then there will be (« — 2m)
linear factors with real coefficients and m quadratic factors with real
coefficients.
Finally, as an obvious consequence of this last corollary:
Corollary 4-1 (c) An odd degree polynomial with real coefficients must
have at least one real zero.
The significance of these results is best illustrated by an example which
shows how they may often be used to simplify a difficult problem to the
point at which the solution may be determined by familiar methods.
Example 4-6 A polynomial P(z) of degree 5 is defined by the relationship
P(z) = z 5 + 5z 4 + 10z 3 + 10z 2 + 9z + 5.
Given that z = i is a zero, deduce the remaining four zeros and use the
result to express P(z) as the simplest possible product of factors having real
coefficients.
Solution First, as the coefficients of P(z) are all real, Theorem 4- 1 is applic-
SEC 4-3 COMPLEX NUMBERS AS VECTORS / 123
able. Hence if ?. — ns a zero, then so also is z = —i. Thus (z — /) and (z + i)
are factors, as is their product (z — i)(z + i) = z 2 + 1 . Using ordinary
long division to divide P(z) by (z 2 + 1) we find that
P(z)l(z 2 + 1) = z 3 + 5z 2 + 9z + 5.
Hence to find the remaining factors we must now factorize this cubic poly-
nomial. As the degree is odd, and the coefficients are real, Corollary 4-1 (c)
applies showing that it must have at least one real zero. At this point we have
recourse to trial and error to find the real zero which for the purposes of this
example has been made an integer.
Thus, setting
Q(z) = z3 + 5z 2 + 9z + 5,
we must find a value z = z\ such that Q(zi) = 0. By inspection we see that
<2(— 1) = showing that the real zero is z = —1. This corresponds to the
linear factor with real coefficients (z + 1). Removing the factor (z + 1)
from the cubic by long division, we then find that
= = z l + 4z + 5.
(z 2 + l)(z +1) (z + 1)
Finally we apply the standard formula for the roots of a quadratic to this
expression to obtain the remaining two zeros. Completing the calculation,
these are found to be z = —2 — i and z = —2 + ;'. Thus the five zeros are
z = i, z = —i, z = — 1, z = — 2 — /, and z = — 2 + i. The required
factorization is
P(z) = (z + l)(z 2 + l)(z 2 + 4z + 5).
4-3 Complex numbers as vectors
So far we have discussed the basic arithmetic of complex numbers but have
not mentioned their vector properties. To do this, and to give a geometrical
representation of complex numbers, we plot them as points in a plane called
the complex plane or, sometimes, the z-plane. Specifically, we shall use the
real part of the complex number as its horizontal or x-coordinate and the
imaginary part of the complex number as its vertical or j-coordinate. Thus
to each complex number there corresponds just one point in the complex
plane and, conversely, to each point in the complex plane there corresponds
just one complex number. The relationship between points and complex
numbers is one-one. In the complex plane, the x-axis is the real axis and the
j-axis is the imaginary axis. Other accounts of this subject often refer to
this geometrical representation of complex numbers as the Argand diagram, in
honour of its inventor.
124 / COMPLEX NUMBERS AND VECTORS
CH 4
'y Complex-plane
-2 -1
z=-l-ii
• !••:
(a)
y Complex-plane
2-
l»: = i
f = 24 /•
2=1
■ = 2-iV
(b)
Fig. 41 Representation of complex numbers: (a) point representation; (b) vector
representation.
In the complex plane, a complex number may either be considered as a
point in the plane or, equivalently, as the directed straight line element from
the origin to the point in question. We shall remember this dual relationship
between points and vectors but, for simplicity, will usually speak only of
points in the complex plane.
This duality between points and vectors is indicated in Fig. 4-1 where the
complex numbers z = 1 , z = i, z = 2 + /, z = 2 — /, and z = — 1 — \i
have been represented as points (Fig. 4-1 (a)) and as vectors (Fig. 4-1 (b)).
In the case of the vector representation, arrows have been added to show that
the vector is drawn from the origin to the point in question.
Notice that if a number, together with its complex conjugate, are plotted
in the complex plane, as for example 2 + i and 2 — i'm Fig. 4- 1 (a) and (b),
then geometrically, in both the point and the vector representations, one is
obtainable from the other by reflection in the *-axis as though it were a
mirror.
Instead of adding and subtracting vectors analytically by use of Definitions
4-2 and 4-3, the same result may be achieved entirely geometrically as we now
indicate. Consider the sum of the vectors z\ = 2 + i and zz = 1 + 2/.
Analytically z\ + z 2 = 3 + 3i, and Fig. 4-2 (a) shows this result. The same
result may be obtained geometrically by the following construction. If we
wish to add vector z% to z\, then for the purposes of addition we shall imagine
vector Z2 to be freed from the origin, so that it is capable of translation any-
SEC 4-3
COMPLEX NUMBERS AS VECTORS / 125
Fig. 4-2 Algebraic operations with complex numbers: (a) vector addition: zi + Z2;
(b) vector subtraction: z\ — z-i.
where in the complex plane, but we shall assume that wherever we re-locate
it in the complex plane it will always be kept parallel to its original position,
and its length and sense will be preserved. The result of adding z<i to z\ is
then achieved by translating z% in the manner described until its origin is
located at the tip of vector z\. The two arrows of vectors z\ and z% then point
in the same direction, and the vector z\ + z<l is the line element directed from
the origin to the tip of the vector z<l in its new position. Tn Fig. 4-2 (a) this
construction is represented by the lower triangle comprising the parallelogram.
Such triangles are vector triangles.
A vector not attached to a specific origin or one which, for the purposes
of combination with another vector, is freed from its origin to be re-located in
some other part of the complex plane will be called a. free vector. This is in
contrast to a vector that is attached to a definite origin which we shall call a
bound vector. In the addition of z<t to z\ that we have just performed, z\ was
regarded as a bound vector and Z2 as a free vector.
Notice that by the same argument, z\ may be freed and its origin trans-
lated to the tip of the bound vector z% to form the vector z 2 + z\, which is
the line element directed from the origin to the tip of vector z\ in its new
position. In Fig. 4-2 (a) this construction is represented by the upper triangle
comprising the parallelogram. The fact that both constructions give rise to
the same line representing on the one hand z\ + z%, and on the other z% + z\,
126 / COMPLEX NUMBERS AND VECTORS
CH 4
proves that vector addition is commutative, since z\ + z 2 = z 2 + z\.
Before proceeding with the discussion of subtraction, we first observe that
Definition 4-4 implies that multiplication of the bound vector z by — 1
reverses its direction. That is to say its origin remains fixed, but the line
element representing the vector is rotated about the origin through the angle
77. With this remark in mind we see that subtraction of vector z 2 from z\
(Fig. 4-2 (b)), is just a special case of addition in which the vector to be added
is — Z2. The vector — z 2 is obtained from z 2 by reversing the direction of z 2 ,
as is indicated in Fig. 4-2 (b) by the dotted line directed into the fourth
quadrant. The vector z\ — z 2 is then the line element directed from the origin
to the tip of the reversed vector z 2 in its new position. In Fig. 4-2 (b) this
construction is shown in the right-hand half of the plane. The same construc-
tion, with the roles of z\ and z 2 interchanged, is shown in the left-hand half
of the plane and when compared with the first result proves that z\ — z 2
= -(z 2 -zi). (Why?)
Thus far, complex numbers have been seen to obey the addition, multi-
plication, and distributive axioms of real numbers, and the reader might be
forgiven for wondering if there is any significant difference between them and
the real numbers. The answer is yes. Whereas real numbers can be given a
natural order according to their size, complex numbers cannot. A glance at
Fig. 4-1 (b) makes it clear that no natural order exists in the field of complex
numbers, comprising all numbers in real-imaginary form, since even vectors
of the same length may be differently directed, for instance the pairs of vectors
1 and /, and 2 + / and 2 — /. Whereas it makes sense to order the lengths
of vectors, since these are scalar quantities and may be so ordered, the vectors
themselves have no natural order. To further our argument we now name the
length of a vector and introduce a notation whereby it may be manipulated
in equations.
Fig. 43 Modulus and argument representation.
SEC 4-3 COMPLEX NUMBERS AS VECTORS / 127
definition 4-7 (modulus of a vector) The quantity
\ Z \ =( X 2 +J , 2)1/2
is called the modulus of the vector z = x + iy. It is the length of the line
element drawn from the origin to the point (x, y) in the complex plane (see
Fig. 4-3).
Example 4-7 If z = 3 + 4/, then \z\ = (3 2 + 4 2 ) 1 / 2 = 5.
Notice that in the special case Im z = 0, \z\ reduces to the absolute
value of a real number since, as always, the positive square root is involved
in the definition. The following useful results are easily verified:
zz = |z| 2 ; |ziz 2 | = |zi| . |z 2 |.
If either the upper or lower triangles comprising the parallelogram in
Fig. 4-2 (a) are considered, then clearly, when expressed in terms of the
modulus, the Euclidean theorem 'the sum of the lengths of any two sides of
a triangle exceeds the length of the third side' becomes the following inequality
relating moduli :
|zi| + N > |zi + z 2 |. (4-11)
Equality will occur only when z\ and z 2 are collinear. For obvious reasons
Eqn (4-11) is called the triangle inequality, and it has already been encountered
in simple form when we discussed the absolute value of the sum of two real
numbers. An analytic proof of result (4-11) is set as a problem at the end of the
chapter.
Another useful inequality relating the moduli of the complex numbers
zi and z 2 is
|zi + z 2 | > ||zi| - |z 2 ||, (4-12)
where again equality occurs only when z\ and z 2 are collinear. The proof of
this is also left to the reader as a problem.
Example 4-8 If z\ — 3 + 4/ and z 2 = 4 + 3/, then z\ + z 2 = 7 + 7/.
Hence |zi| = (3 2 + 4 2 )i' 2 = 5, |z 2 | = (4 2 + 3 2 ) 1 ' 2 = 5, and |zi + z 2 |
= (7 2 + 7 2 ) 1 '' 2 = a/98, so that \zi\ + \z 2 \ = 10 and \\zi\ - \z 2 \\ = 0. We
have thus verified inequalities (4-11) and (4T2) in this special case, for they
demand that for any z\ and z 2
IN - Nl < |zi + z 2 | < N + |z 2 |
which in this case corresponds to the valid inequality
< V98 < 10.
128 / COMPLEX NUMBERS AND VECTORS CH 4
4 -4 Modulus-argument form of complex numbers
Referring again to Fig. 4-3, we see that the complex number z need not be
specified in the standard form for it may equally well be specified by giving
both the value of \z\ and the angle 6 which, by convention, is always measured
positively in an anti-clockwise direction from the x-axis to the line of the
vector z. The angle 6 is the argument of z and we shall write 6 = arg z. The
argument of z is indeterminate with respect to multiples of 277, because angles
6 and 6 + 2kir, where k is any integer, will give rise to the same line on
Fig. 4-3. Later we shall see that this indeterminacy in 6 plays an important
role in the determination of the roots of complex numbers. When 6 = arg z
is restricted to the interval — n < 6 < n, it will be termed the principal value
of the argument.
If we define the real number r by the equation r = |z|, and still set
6 = arg z, then the ordered number pair (r, 6) describes the polar coordinates
of the point z in Fig. 43. That is, the radial distance of a point from the
origin together with its bearing measured from a fixed line through the
origin. The relationship between the Cartesian coordinates (x,y) and the
polar coordinates (r, 6) of the same complex number z is immediate, since
from Fig. 4-3 we have
x = r cos 6 y = r sin 8 (4T3)
or, equivalently,
r = (x 2 + V 2 ) 172 cos 6 = - sin = - (4-14)
Thus the complex number, or vector, z = x + iy may also be written in the
modulus-argument form
z = /-(cos d + i sin 6). (4T5)
Because arg z is indeterminate up to an angle 2krr, we must phrase our
definition of equality between two complex numbers carefully when it is to
refer to complex numbers expressed in modulus-argument form.
definition 4-8 The two numbers z\ = r(cos d + i sin 6) and z 2 =-p(cos </>
+ i sin <f>) expressed in modulus-argument form will be said to be equal if,
and only if, r = p and 6 = <f> + 2A?n-.
Equations (4T3) and (4T4) enable immediate interchange between the
modulus-argument and the real-imaginary forms of z, as the following
examples indicate.
Example 4-9
(a) Express z = — 4V3 + 4/ in modulus-argument form;
SEC 4-4 MODULUS-ARGUMENT FORM OF COMPLEX NUMBERS / 129
(b) Express z = 2 + 5un modulus-argument form;
(c) If \z\ = 3 and arg z = — 7t/10, express z in real-imaginary form.
Solution (a) From Eqn (4-14), r = \z\ = [(-4V3) 2 + 4 2 ] 1/2 = 8, whilst
cos = — (4V3)/8 = — (V3)/2 and sin = 4/8 = J, from which we deduce
that the principal value of must lie in the second quadrant with = arg z
= 577/6. Hence, in modulus-argument form
/ 577 . 5n\
^ cos _ + i - sln _j.
Notice that although we could have written 6 = arg z = arc tan (—l/\/3),
it would not then have been clear in which quadrant 8 must lie, and, conse-
quently, we shall always specify sin and cos separately.
Solution (b) Again from Eqn (4-14), r = \z\ = (2 2 + 5 2 ) 1/2 = V29, whilst
this time cos = 2/V29 and sin 8 = 5/^29, from which we deduce that
the principal value of 8 must lie in the first quadrant with 6 = arg z = 1-1903
rad. Hence, in modulus-argument form
z = V29(cos 1-1903 + /sin 1-1903).
Solution (c) The result is immediate, since Eqn (4- 1 5) gives
z = 3 {cos(-f ) + /sin(-f o )j
= 2-8533 - 0-9270/.
We now examine the consequences of multiplication and division for
complex numbers expressed in modulus-argument form. Let z\ and zi be
the two complex numbers :
z x = n(cos 0i + i sin 61) and z% — rz{cos 62 + i sin 92). (4-16)
Then by direct multiplication we find that
z lZ2 = n/"2[(cos 61 cos 62 — sin 61 sin 62)
+ /(sin 81 cos 82 + cos 61 sin 82)],
and using the trigonometric identities for cos (0i + 2 ) and sin (0i + 6 2 )
this may be written as
ziz 2 = rir 2 [cos (0i + 2 ) + i sin (0i + 2 )]. (4-17)
We have thus proved that the result of the product ziz 2 is a complex number
with modulus |ziZ2| = /V2 and argument arg (ziz 2 ) = 0i + 2 = arg zi
+ arg Z2. Thus the result of multiplying two complex numbers is to produce
a complex number whose modulus is the product of the two separate moduli
130 / COMPLEX NUMBERS AND VECTORS
CH 4
and whose argument is the sum of the two separate arguments (see Fig. 4-4).
A special case results if we write
/ = cos \n + i sin \tt.
(4-18)
It follows that in the z-plane, multiplication by /' corresponds geometrically
to an anti-clockwise rotation through \n without any change of size. To
illustrate this, the vectors iz\ and /z 2 have been added to Fig. 4-4.
Fig. 4-4 Multiplication and division; ziz2, zi/22.
By repeated application of Eqn (4-17) it is easily proved that if z m
= r„»(cos m + i sin m ) for m = 1, 2, . . ., n, then
ziz 2 • ■ • z„ = r\r% ■ • • r n [cos (0i + 2 + • • • + 0»)
+ isin(fli + fl a + • ■ • + e„)]. (4-19)
An argument essentially similar to that which gave rise to Eqn (4-17),
but this time using the trigonometric identities for cos (61 — 62) and
sin (0i — 02), establishes that whenever z 2 =£ 0, then with the same notation
we have
— = - [cos (0i - 2 ) + i sin (0i - 2 )].
Z2 n
(4-20)
Obviously |zi/z2| = nfn = \z1\l\z2\ and arg(zi/z 2 ) = 0i — 2 = argzi
— arg 22. Expressed in words, this says that the result of dividing two complex
numbers is to produce a complex number whose modulus is the quotient of
the separate moduli and whose argument is the difference of the two separate
arguments.
A most important special case of Eqn (4-19) occurs when all the 21, 22,
. . ., z n are equal to the same complex number 2 = r(cos 8 + i sin 0), say.
The result then becomes
z n — r «( cos n Q + / s in nd).
SEC 4-4 MODULUS-ARGUMENT FORM OF COMPLEX NUMBERS / 131
Substituting for z and cancelling a real factor r n , we obtain the following
important theorem.
theorem 4-2 (de Moivre's Theorem)
(cos 6 + i sin 6) n = cos nd + / sin nd.
A more subtle argument would have yielded the fact that this remarkable
result is true for all real values of n, and not just for the integral values
utilized in our proof. This will be undertaken later when the complex exponen-
tial function has been discussed.
Theorem 4-2 provides a simple method by which certain forms of trigo-
nometric identity may be established. One typical example is enough to
illustrate this.
Example 4-10 Let us relate sin 46 and cos 46 to sums of powers of sin 6
and cos 6. Set n = 4 in Theorem 4-2 and expand the left-hand side by the
binomial theorem, using the fact that i 2 = — 1, p = —i, j 4 = l, etc., to
obtain
cos 4 6 + 4/ cos 3 6 sin d - 6 cos 2 6 sin 2 6 - 4i cos 6 sin 3 6 + sin 4 6
= cos 46 + / sin 46.
Then, recalling that equality of complex numbers means equality of their
real and imaginary parts considered separately, we have the two results:
equality of real parts
cos 4 6-6 cos 2 6 sin 2 6 + sin 4 6 = cos 46,
and
equality of imaginary parts
4(cos 3 6 sin 6 - cos 6 sin 3 6) = sin 46.
These are the desired results. It is characteristic of complex numbers that any
single complex equality implies two real equalities, and even if only one is
sought the other will be generated automatically. The same method works
for any positive integral value of n when it will connect sin nd and cos «0
with sums of powers of sin 6 and cos 6.
We shall return to this idea in connection with the exponential function,
and show that it is possible to use de Moivre's theorem to express sin" 6
and cos" 6 in terms of sums involving sin rd and cos r6.
Sometimes Theorem 4-2 can be used to reduce the labour of computation
as now shown.
Example 4-11 We shall evaluate z 10 where z = 1 + i. Rather than making
132 / COMPLEX NUMBERS AND VECTORS CH 4
repeated multiplications, or applying the binomial theorem, we write z in
modulus-argument form as z = \/2(cos n/4 + i sin tt/4), when we have
z 10 = ( A /2) 10 (cos n/4 + i sin tt/4) 10 . By de Moivre's theorem this becomes
7 io
2 5 I cos — + / sin — I = 32/'.
4-5 Roots of complex numbers
When performing algebra on real numbers the idea of the root of a number
plays a fundamental part. The same is true when manipulating complex
numbers, and we now discuss the general ideas involved in determining their
roots.
Let p/q be any rational number, where/? and q are integers with q supposed
positive. We shall assume that/j and q have no common factor.
definition 4-9 We define zvlv by saying that:
W = ZP'9 o W'« = ZP.
Let
w = p(cos <f> + i sin <f>) and z = r(cos B + i sin 6). (4-21)
Then from Definition 4-9 and de Moivre's theorem we have
p<?(cos q<f> + i sin q<j>) = rJ>(cos/>0 + /' sin pB). (4-22)
Now from Definition 4-8 it follows that
p q = r v and q<j>=pd + lk-n, (4-23)
and so
. p6 + lk-n
p = r viQ and 6 = C4-24)
q
The expressions w = zv'i thus have the general form
(p6 + 2k7T\ . . (pB + lk-n
w = zp'9 = rP l( i
with k an integer. (4-25)
cos
1
It is easily seen that only q different values w , w\, w%, . . ., w g -i of w will
result from Eqn (4-25) as the integer k increases through successive integral
values. It is usual to give k the q successive values k = 0, 1, 2, . . ., q — 1.
If k is allowed to increase beyond the value q — 1, then the numbers w , w\,
. . ., w a -i will simply be generated again because of the periodicity properties
of the sine and cosine functions.
Example 4-12 We illustrate the use of Eqn (4-25) by determining the n
numbers w satisfying the equation w = (l) 1 '". For obvious reasons these are
SEC 4-5
ROOTS OF COMPLEX NUMBERS / 133
called the nth roots of unity. Comparing this equation with the general expres-
sion w = z^ii that has just been discussed we see that we must make the
identifications z = 1, p = 1, and q = n. To proceed further we must write
the number unity in its modulus-argument form
1 = 1. (cos + / sin 0),
so that comparing this with z in Eqn (4-21) we see that the further identifica-
tions r = 1 and = must be made. Substitution of these quantities into
Eqn (4-25) then gives the result
2krr . 2k-n
w'jt = cos h / sin
n n
with k = 0, 1, 2,
1.
The result of this calculation with n = 5, for example, is to generate the
fifth roots of unity. In Fig. 4-5 these roots are plotted as the numbers wq, w\,
. . ., W4 in the complex plane. They are uniformly distributed around the
unit circle centred on the origin. By making use of the vector properties of
complex numbers we shall usually represent this circle by the convenient
notation \z\ = 1. (Why?)
Fig. 4-5 Fifth roots of unity.
Fig. 4-6 Roots of co = (1 + 2 ' 3 -
Example 4-13 As a slightly more general example we now determine z 2 / 3 ,
when z = 1 + /. In this case/? = 2, q = 3, and in modulus-argument form,
z = \/2(cos 77/4 + / sin tt/4) showing that r = -y/2 and 6 = w/4. Substitution
into Eqn (4-25) gives
134 / COMPLEX NUMBERS AND VECTORS
CH 4
W = 2 1 "
COS
with k = 0, 1, 2.
/I +4AA /l +4Jfe\ "I
l-6-j ,r + l,U,, i-6-j w .
The three roots wo, wi, and W2 are thus :
(k = 0): w> = 2i/3 /cos £ + i sin g) = 2i« (^ + -),
(k = 1): wi = 2i'» (cos ^ + ''an?) = 2W- ^ + Ij,
(k = 2): w 2 = 2 1/3 /cos ^ + jsin y) = -2i'*i.
These are plotted in the complex plane in Fig. 4-6, where they are seen to be
uniformly distributed around the circle \z\ = 2 1/3 .
Example 4-14 As a final example let us find the roots of the equation
w = i~ 113 .
In terms of the notation of Eqns (4-21) and (4-25), and recalling that we have
agreed always to take q as positive, we have p = — 1, q = 3, and z = i.
Now in modulus-argument form
'-•(
77 . TT\
cos - + i sin -
2 2
so that r = 1 and 8 = tt/2. Hence, substituting into Eqn (4-25),. we find that
w = cos
"(-77/2) + Ikli
+ /sin
\-7TJ2) + Ikrr-
with k = 0, 1, 2.
Hence the three roots wo, wi, and W2 are :
(k = 0) : H'o = (cos tt/6 — i sin tt/6) = J( V3 — 0,
(& = 1): w\ = (costt/2 + /sin7r/2) = /,
(A: = 2): H-2 = (cos 7tt/6 + /sin 7tt/6) = -J(V3 + «')•
This completes our preliminary encounter with complex numbers, and
our study will be resumed later in connection with the complex exponential
function and with functions of a complex variable. The remainder of this
chapter is devoted to developing the foundations of our study of general
vectors.
4-6 Introduction to space vectors
It is clear that any set of vector quantities that do not all lie in a plane cannot
be represented vectorially in the form of complex numbers. For example,
SEC 4-6
INTRODUCTION TO SPACE VECTORS / 135
even the vectors describing the velocity of a vehicle as it is driven at constant
speed past fixed points on a winding hill could not be so represented. Pair-
wise these velocity vectors define planes, and so could be represented by
complex numbers in those planes, though different pairs of vectors would
define different planes, thereby making any general representation impossible
in terms of complex numbers. The trouble here is not hard to find. It is that
complex numbers just happen to be capable of representation as planar
vectors with their own appropriate descriptive language, and they were not
developed with general vector representation in mind. In short, they are
complex numbers first and vectors second; not the other way around.
To overcome this limitation and to be able to describe arbitrary vector
quantities we must preserve the idea of a vector as a directed length, but
re-think its description. This is best achieved using a diagram, so consider
Fig. 4-7 which depicts the mutually perpendicular Cartesian axes 0{x, y, z}
with origin O. In more mathematical terms we describe these axes as being
mutually orthogonal. This is a technical term that in a geometrical context
has the same meaning as perpendicular, though it is often used in a wider
sense, when the word perpendicular would be inappropriate. Henceforth we
shall almost always use the term orthogonal.
The manner of identification of the x, y, and z coordinate axes is not
C_3
£(0, <>.<»,)
t
P("V°2,",)
,<&
"V*Jr
*s
i
i
o,
H0,o r 0)
^
\
\
fi(fl,,fl 2 ,0)
y
Fig. 4-7 Right-handed Cartesian axes.
A 6 "•- fit o +- o <■-■
0^ '• OA x ^ ^
136 / COMPLEX NUMBERS AND VECTORS CH 4
arbitrary, but is made in such a manner that they form a right-handed system
of axes. By this we mean that having assigned axes for the variaWes-nrnTT y,
together with the directions in which they increase positively, the direction of
positive z is then chosen to be that in which a right-handed screw would
advance were it aligned with the third axis and rotated in the sense x to y.
This sense of rotation is indicated in Fig. 4-7 by means of a directed spiral
about the z-axis. In the diagram the y- and z-axes are supposed to lie in the
plane of the paper with the .v-axis pointing out of the paper towards the
viewer. Later we shall refer to this right-handed property in connection with
axes which are not orthogonal, when right-handedness is still to be interpreted
in exactly the same sense as above.
This right-handed property of the system of axes is shared by each pair of
axes in turn, provided the senses of rotation are appropriately defined. The
following table describes the convention that is always adopted.
Table 41 Right-handed axes
Rotate R-H screw advances
From To in direction of positive
x y z
y z x
z x y
The table can easily be remembered in the concise form
x y z
y z x
z x y
where the entry in any row is obtained from the entry in the row above by
transferring the first letter of that entry to the last position. These entries are
called cyclic permutations of the letters x,y, and z, and further cyclic permuta-
tions will simply regenerate the table. These rules describe the right-handed
symmetry of the 0{x, y, z} axes. If any two letters in an entry are inter-
changed, then by the same rule, the negative direction of the third axis is
defined. Hence the set of letters y x z are to be interpreted 'rotate from y to
x to make a right-handed screw aligned with the z-axis advance in the direction
of negative z'.
If in the above argument a right-handed screw motion had been replaced
by a left-handed screw motion, then a left-handed system of axes would have
resulted. Although a left-handed system of axes is in all respects equivalent to
a right-handed system for the purposes of vector representation, it is customary
to work with right-handed systems.
Let P be the point with coordinates x = a\, y = a%, and z = a$ illustrated
in Fig. 4-7. We shall denote it by the more concise notation (a\, a<z, as) where
SEC 4-6 INTRODUCTION TO SPACE VECTORS / 137
the first, second, and third entries in this ordered number triple represent the
.v, )', and z coordinates, respectively. Then from the point of view of coordinate
geometry it is the point P that is of interest, whereas from the point of view
of vectors it is the directed line element from O to P that is of interest. To
signify that it is the vector quantity that interests us here we shall write OP.
Notice that by this convention the vector PO is the directed line from P to O
and is opposite in sense to OP. In future we will denote the length of the vector
OP by |OP|, which is a scalar, and by definition this length will always be
positive.
In Fig. 4-7 the lengths OA = a\, OB = a%, and OC = az are called the
orthogonal projections of OP onto the .y-, y-, and z-axes, and a simple applica-
tion of Pythagoras' theorem gives the result
|OP| 2 = (OA) 2 + (OB) 2 + (OC) 2
or,
|OP| 2 = ai 2 + a 2 2 + fl3 2 . y
Dividing by |OP| 2 this becomes
1
\|OP|/ + l|OP|/ ^\\OV\)
which can then be rewritten in terms of the angles 0i, 02, 03 as
1 = cos 2 01 + cos 2 02 + cos 2 3 . (4-26)
If the numbers /, m, and n are defined by the relations
/ = cos 0i, m = cos 02, n = cos 03, (4-27)
then Eqn (4-26) becomes
1 = /a + m 2 + „2. ( 4 . 2 8)
For obvious reasons /, m, and n are called the direction cosines of OP with
respect to the axes 0{x, y, z) and it is often convenient to write them in the
form of an ordered number triple as {/, m, n}. The angles 0i, 02, and 83 are
indeterminate to within a multiple of 2n and, by convention, they will always
be taken to lie in the interval [0, 77].
Consider the direction cosines /, m, n as defining a point P' in space with
coordinates x = I, y = m, and z = n, then, by Pythagoras' theorem and
Eqn (4-28), the vector OP' must have unit length. The direction and sense of
OP' are the same as those of OP; only the lengths are different. Vectors of
unit length in given directions prove to be extremely useful in vector analysis
so they are appropriately called unit vectors.
Now by definition, the direction cosines /, m, n are proportional to the
138 / COMPLEX NUMBERS AND VECTORS CH 4
coordinates a\, a%, a$ of the point P and consequently the numbers oi, a-i,
and «3 are often called the direction ratios of OP. To convert direction ratios
to direction cosines it is necessary to normalize them by dividing by the
square root of the sum of the squares of the direction ratios. This is, of course,
equivalent to division by the quantity we have agreed to denote by |OP|.
Example 4-15 Find the direc tion ratios, the direction cosines and the angles
0i, 02, and 03 of the vector OP, where P is the point (1, —2, 4).
Solution The direction ratios are 1, —2, 4, and |OP|, which is the square
root of the sum of the squares of the direction ratios, is
|OP| = (l 2 + (-2) 2 + 4 2 ) 1 ' 2 = V21.
Hence the direction cosines of OP are /= \j\/2\, m = — 2/-y/21, and
n = 4/\/21, from which the angles 0i, 02, and 03 are seen to be 1-351, 2-022,
and 0-509 radians, respectively. Unless otherwise stated we shall always
express angles in terms of radians, as here.
Example 4-16 Determine the angles of inclination 0i, 02, and 03 of a vector
to the x-, y-, and z-axes, respectively, given that its direction cosines are:
(a){|, -V3/2.0},
(b){|,i, VH/4}.
Solution (a) Here / = cos 0i = 1/2, m = cos 02 = — \/3/2, n = cos 03
= 0, so that 0i = 7t/3, 02 = 577/6, and 03 = tt/2. Hence in this case the vector
lies entirely in the (x, j)-plane.
Solution (b) In this case, / = cos 0i = 1/2, m = cos 02 = 1/4, n = cos 03
= a/H/4, so that 0i = 77/3, 2 = 1-318, and 3 = 0-593.
Example 4-17 If a vector has direction cosines {^, m, %} deduce the possible
values of m. If, in addition, it is stated that the vector makes an obtuse angle
02 with the j-axis determine the value of 02.
Solution We use Eqn (4-28), setting / = \ and n = \ to obtain
(1)2 + W 2 + (|)2 = L
Whence, w 2 = 1/2 or m = ± 1/V2- These values of m correspond to
2 = 7T-/4 for m = l/\/2, and to 2 = 3tt/4 for m = -l/\/2. As the angle 2
is required to be obtuse we must select 02 = 37r/4.
The idea of a fixed origin is fundamental to coordinate geometry though
it proves to be rather too restrictive in vector analysis. This is because it is
SEC 4-6
INTRODUCTION TO SPACE VECTORS / 139
only the magnitude, direction, and sense of a vector that usually matter, and
not the choice of origin and coordinate system in which the vector is repre-
sented. For example, when specifying a wind velocity it is normally sufficient
to say 20 ft/s due East, without identifying the particular points in space at
which the air has this velocity.
In vector work this ambiguity as to the location of a vector in space is
allowed by considering as equivalent, any two vectors that may be repre-
Fig. 4-8 Translation of axes without rotation.
sented by directed line elements of equal length which are parallel, and have
the same sense. In Fig. 4-8 we have depicted two vectors OP and O'P' that
are equivalent in the sense just defined. Another way to definelhis equivalence
is to require that when the axes 0{x, y, z} are translated, without rotation, to
the position 0'{x',/,z'}, the coordinates of P' with respect to the axes
through O' are the same as those of P with respect to the axes through O.
That is, if P is the point (a u a 2 , fl 3 ) in the system of axes 0{x, y, z}, then P'
is the point (a u a 2 , a 3 ) in the system of axes 0'{x', /, z'}. Do not get confused
140 / COMPLEX NUMBERS AND VECTORS
CH 4
by this. If O' is the point (ai, 0C2, as) with respect to 0{x, y, z}, then coordi-
nates in the unprimed system are related to those in the primed system by the
equations x = oa + x', y — a.i + y' , and z = 0C3 + z'.
This freedom to translate vectors now enables us to give direction cosines
to any vector in space and not just to those having their base at O. Suppose,
for example, that we require the length and direction cosines of the vector AB,
where A is the point {ay, 02, 03) and B is the point {by, bz, bz) when expressed
relative to some set of axes 0{x, y, z}. Then we see at once that the lengths
of the projections of AB on the x, y, and z axes are {by — ay), {b% — 02),
and (63 — as), respectively. Accordingly, by translating the vector AB until
A in its new position A' coincides with O, we see that the tip B in its new
h
Fig. 4-9 Translation of a vector.
SEC 4-6 INTRODUCTION TO SPACE VECTORS / 141
position B' must be the point ((bi — a\), (bi — 02), (bs — #3)) (see Fig. 4-9).
Hence |AB|, that is the length of AB, is
|AB| = [(61 - ai)2 + (b 2 - a 2 ) 2 + (bs - as) 2 ] 1 ' 2 . (4-29)
The direction cosines of AB then follow as before and are
bi — cti b 2 - a-i b% — a%
1 = , , „, ' m = , . _ > n = , kn . • (4-30)
|AB| |AB| |AB| v '
Example 418 Find |AB| and the direction cosines of the vector AB, if A
has coordinates (1, 2, 3) and B the coordinates (4, 3, 6).
Solution From Eqn (4-29) we see that |AB| = [(4 - l) 2 + (3 - 2) 2
+ (6 - 3) 2 ] 1/2 = a/19, whilst from Eqn (4-30)~¥ follows that / = -3/V19,
m= 1/V19, and« = 3/V19.
It is now convenient to introduce a triad of unit vectors, denoted by i, j,
and k, that are parallel to and are directed in the positive senses of the x-, y-,
and z-axes, respectively. Here we remind the reader that these are called unit
vectors because they are each of unit length on the x-, y-, and z-length scales.
Notice that the term right-handed that was applied to the system of axes
0{x, y, z} also applies to the triad of vectors i, j, k when taken in this order.
We shall use this idea again later.
An arbitrary vector in any one of the i, j, or k directions may then be
obtained by scaling the length of the appropriate unit vector by a multiplica-
tion factor fi. Thus a vector three times the size of the unit vector i will be
written 3i, whilst a vector twice the size of the unit vector k, but oppositely
directed, will be written —2k.
Returning to Fig. 4-7 we see that in terms of i, j, and k, the vectors OA,
OB, and OC may be written as
OA = ail, OB = a 2 j, OC = ask.
From our ideas of vector addition in a plane the vector OQ lying in the
(x, j»)-plane is OQ = OA + AQ or, because vectors may be translated,
OQ = OA + AB. Now in terms of our unit vector notation this may be
written OQ = a\i + a 2 j. Turning attention to the plane containing points O,
Q, and P, we see that by the same argument OP = OQ + QP. Again,
because vectors may be translated, QP = OC so that finally, on substituting
for OQ and QP in the equation OP = OQ~+ QP, we obtain
OP = aii + a 2 j + tf 3 k. (4-31)
For ease of notation, arbitrary vectors, like unit vectors, will usually be
142 / COMPLEX NUMBERS AND VECTORS CH 4
denoted by a single symbol such as a, a, or r. Thus a general point P in space
with coordinates (x, y, z) will often be written
r = xi + y\ + zk. (4-32)
The almost universally accepted convention which we adopt here is to denote
vector quantities by bold face type and scalar quantities by italic type.
Because a vector such as that in Eqn (4-32) identifies a point P in space
itiscaWedapositionvector. In the vector representation Eqn (4-31) the numbers
ci, «2, and as are called the components of OP.
Two vectors will only be said to be equal if, when written in the form
of Eqn (4-31), their corresponding components are equal. The vector a
= aii + fl2J + «3k will be said to be a scalar multiple A of vector b
= bii + foj + fok, and we will write a = /lb if, and only if, ai — Xb±,
#2 = A62, and #3 = A63. In the special case X = — 1 we have a = — b,
showing that |a| = |b|, but that the senses of a and b are opposite. Thus in
Fig. 4-7 we have OP = -PO.
The zero or null vector is the vector whose three components are each
identically zero. It is often denoted by O instead of 0, since confusion is
unlikely to arise on account of this simplification of the notation. Following
on from our first ideas of vectors, and in accordance with the derivation of
Eqn (4-31), we now define the operations of addition and subtraction of
vectors.
definition 4T0 Let a and b be arbitrary vectors with components
(ai, at, a$) and (bi, 62, 63), respectively, so that they may be written
a = aii + 02} + 03k and b = bii + bz\ + 63k. Then we define the sum
a + b of the two vectors a and b to be the vector (ai + b{)i + (a% + b^)'}
+ (03 + &3)k. The difference a — b of the two vectors a and b is defined to
be the vector (ai — bi)i + (02 — b^)} + (az — bz)k.
Because real numbers are commutative with respect to addition, it follows
directly from this definition that the operation of vector addition is commuta-
tive. That is we have a + b = b + a for all vectors a and b. When the sub-
traction operation is considered the properties of real numbers imply the
result a — b = — (b — a) for all vectors a and b.
Example 419 If a = i + j + 2k and b = 3i - 3j + k, then a + b
= (1 + 3)i + (1 - 3)j + (2 + l)k, showing that a + b = 4i - 2j + 3k.
Reversal of the order of the sum followed by the same argument proves the
commutative property a -f b = b + a for these particular vectors. In the
case of subtraction we have a — b = (1 — 3)i + (1 — (— 3))j + (2 — l)k,
showing that a — b = — 2i + 4j,+ k. It is easily established that a — b
= -(b - a).
SEC 4-6
INTRODUCTION TO SPACE VECTORS / 143
Although these particular results could be illustrated diagrammatically,
the vector triangles involved would look essentially the same as those used
earlier in connection with addition and subtraction of complex numbers and
would be arrived at by the same reasoning. Rather than illustrate this specific
case, we present in Fig. 4-10 the results of addition and subtraction of arbitrary
vectors a and b. Because a geometrical projection method is necessary to
illustrate three-dimensional problems on a sheet of paper, such diagrams are
much less useful as a tool than was the case in a plane. Accordingly, we shall
usually concentrate on an analytical approach to vectors, using diagrams
4W- -b
Fig. 4 10 Addition and subtraction of vectors.
only when they seem likely to be helpful.
Two terms worthy of note that are applied to vectors are the names
parallel and anti-parallel. Two vectors will be said to be parallel when their
lines of action are parallel and their senses are the same. Conversely, two
vectors will be said to be anti-parallel when their lines of action are parallel
but their senses are opposite. Thus if a is a vector and /j, is a scalar, the vectors
a and fin are parallel if> > and are anti-parallel if> < 0. It follows that
two vectors will be parallel if their corresponding direction cosines are equal
and they will be anti-parallel if their corresponding direction cosines are equal
in magnitude but opposite in sign.
Example 4-20 The vectors a = i + 2j - 4k and b = 3i + 6j - 12k are
such that we may write 3a = b. Since the scalar 3 > it follows that a and
b are parallel. However the vectors c = i — 3j + k and d = — 2i + 6j — 2k
144 / COMPLEX NUMBERS AND VECTORS
CH 4
Fig. 411 Position vectors defining the vector AB.
are such that we may write —2c = d and, as the scalar —2 < 0, it follows
that c and d are anti-parallel. By the same argument, the two vectors
p = 3i — j + 2k and q = 6i + 2j + 4k are neither parallel nor anti-parallel,
since for no scalar /u is it true that ftp = q.
The length of the vector AB which we have already denoted by |AB|
is a useful quantity and, as with complex numbers, is called the modulus of
the vector AB. Its formal definition follows.
definition 4-11 The modulus |a| of the vector a = aii + d2) + 03k is
the positive square root
|a| = fa* + a 2 2 + as 2 ) 1 ' 2 .
It is an immediate consequence of this definition that any vector r with
direction cosines {/, m, n) may be written in the form
r = |rl(/i + m\ + nk).
(4-33)
The proof of this is obvious for by definition, /|r| is the x-component of r,
w|r I is the j-component, and «|r| is the z-component. The form of Eqn (4-33)
shows that any vector may be expressed as the product of a scalar (its modulus)
and a unit vector defining its direction and sense.
SEC 4 '6 INTRODUCTION TO SPACE VECTORS / 145
When it is necessary to define an arbitrary vector AB in space, this may
easily be accomplished by using position vectors a and b to identify its end
points A and B. This is illustrated in Fig. 4-11 from which, by the rules of
vector addition, we may write
OA + AB = OB
or,
AB = OB - OA = b - a.
Examination of this simple but useful result suggests that an accurate name
for the vector AB would be the 'position vector of B relative to A', since in
this role it is A that plays the part of the origin. This more exact name is
seldom used since the symbol AB is sufficiently clear as it stands.
Example 4-21 Let points A and B be identified by the position vectors
a = ~2i - 3j + k and b = 3i - j + 4k, respectively. Find the vector AB
together with its modulus and direction cosines.
Solution The diagram in Fig. 4-11 can be taken to represent this situation
showing that vector AB = b - a. Substituting for the values of a and b,
we find AB = (3i - j + 4k) - (-2i - 3j + k), whence AB = 5i + 2j + 3k.
Then |AB| = (52 + 2* + 3 2 )^ = V38 after which the usual argument
establishes that / = 5/V38, m = 2/V38, and n = 3/V38.
By considering the plane containing the vectors a, b, and b - a in Fig.
4-11, the arguments that established the triangle inequalities for complex
numbers also establish them for arbitrary space vectors. Hence for arbitrary
vectors a and b we have
||a|-|b||<|a + b|<|a| + |b|. ( 4 . 34)
Finally, to close this section, let us find the angle between two vectors
a and b with the direction cosines {h, m u m} and {/ 2 , m 2 , h 2 }, respectively.
When the lines of action of the vectors intersect the angle 6 is well defined
and, by convention, is always chosen to lie in the interval [0, n]. If the lines
of action of two vectors do not intersect then they are merely translated until
they do, when the angle 6 is defined as above. It will suffice to consider the
angle between two unit vectors directed along a and b since the length of the
vectors will obviously not influence the angle between them. From Eqn
(4-33), these unit vectors are seen to be (hi + wij + mk) and (/ 2 i + m 2 j + « 2 k).
These are shown in Fig. 4- 12. They have their tips P and Q at the respective
points (h, mi, m) and (/ 2 , m 2 , « 2 ).
146 / COMPLEX NUMBERS AND VECTORS
CH 4
fW,. m,, «,>
Q(l v m,, «j)
Fig. 4-12 Angle between two lines.
Now, by the cosine rule
|PQ| 2 = |OP| 2 + |OQ| 2 - 2|OP| . |OQ| cos 6,
(4-35)
but | OP | = |OQ| = 1, and by Eqn (4-29), |PQ| 2 = (/ 2 - h) 2 + (m 2 - tm) 2
+ («2 - m) 2 , whilst by Eqn (4-28), h 2 + mi 2 + m 2 = h 2 + w 2 2 + « 2 2 = 1.
Consequently, substituting into Eqn (4-35) and simplifying, we find the
desired result
cos 6 = hh + m\m<i, + mm.
(4-36)
The angle of inclination 6 follows directly from this equation. The restriction
of the angle between the vectors to the interval [0, tt] means that in Fig. 4- 12,
it is the angle 6 that is selected, and not the angle 6'.
As a particular case, if
/1/2 + /M1W2 + M1H2 = 0,
then the two vectors a and b must be orthogonal.
(4-37)
Example 4-22 Find the angle of inclination 6 between the vectors
a = i + 2j + 3k and b = 2i — j — k.
Solution Here |a| = \/14, |b| = \fd, so that the direction cosines {h, m\, m}
of a are h = 1/V14, wi = 2/V14, «i = 3/^/14 whilst the direction cosines
{k, mi, n 2 } of b are h = 2/V6, w 2 = — 1/V6, «2 = — 1/\/6. Hence by Eqn
(4-36), the angle 6 is the solution of the equation
cos 6
\Vi4J we) + Ivwlve) + W14/V6/'
SEC 4-7 SCALAR AND VECTOR PRODUCTS / 147
or
6 = arc cos I 1 •
W21/
On account of the restriction of 8 to the interval [0, -n\ it finally follows that
= 1-905 rad.
4-7 Scalar and vector products
If a = aii + az) + 03k is an arbitrary vector and A is a scalar, then we have
already defined the product Aa to be the vector Aa = Xa\\ + Aa2J + Aa3k.
Hence the effect of multiplying a vector by a scalar is to magnify the vector
without changing its direction. The result of this product is to generate a
vector. We must now discuss the multiplication of two vectors.
Here three-dimensional vector algebra differs radically from the vector
algebra of complex numbers. With complex numbers there is only one
multiplication operation defined, and the product of two complex numbers is
always a complex number. In the case of vectors we shall see that two multi-
plication operations are defined for a pair of vectors. One operation called a
scalar product generates a scalar, whereas the other operation called a vector
product generates a vector. The operation of division is not defined for vectors.
The scalar product of two vectors is a generalization of the notion of the
orthogonal projection of a line element onto another line and is suggested by
Eqn (4-36). Its definition follows.
definition 4- 12 The scalar pro duct of the two vectors a = aii -f- at, j + 03k
and b = bii + 62J + 63k is written a . b and is defined to be the scalar
quantity
a . b = aibi + 0262 + 0363.
Because of the notation used, a scalar product is often colloquially
called the dot product. Some books favour the notation (a, b) for the scalar
product when it is then usually called the inner product of vectors a and b.
To exhibit the relation of a . b to Eqn (4-36) we first divide a . b by the product
of the moduli |a||b| to get
a.b
lallbl
=m$ + m+m}
Then, from the definition of direction cosines, we recognize that this may be
written
a.b
— — - = Uz + mm* + nw 2 , (4-38)
l a ll b l
where {l\, rm, m} are the direction cosines of a and {h, mi, n<i\ are the direc-
148 / COMPLEX NUMBERS AND VECTORS CH 4
tion cosines of b. If 6 is the angle of inclination between a and b then, by
virtue of Eqn (4-36), expression (4-38) becomes
a . b = |a||b| cos 6. (4-39)
This may be taken as an alternative definition of the scalar product a . b.
alternative definition 4-13 The scalar product of the two vectors
a and b is written a . b and is defined to be the scalar quantity
a . b = |a||b| cos 6,
where 6 is the angle between the vectors.
Notice that it is a direct consequence of the definition that the scalar
product of two vectors is commutative. That is, we have a . b = b . a for any
two vectors a and b.
Because of this property we shall sometimes, and without confusion,
write a 2 with the understanding that a 2 = a . a. In practice Definition 4-12
is most used to find the scalar product since it relates the scalar product
directly to the components of the vectors involved. The alternative form set
out in Definition 4-13 is used to find the angle between the two vectors once
the scalar product is known.
Example 4-23 Find the scalar product of the vectors a = — 2i — 3j + k
and b = — i + j + 3k and use the result to find the angle between a and b.
Solution From Definition 4-12 we have a . b = (-2)(— 1) + (— 3)(1)
+ (1)(3) = 2. Now |a| = V 14 and |b| = VI 1> so that substituting in
Definition 4-13 we have 2 = y/14 . y/\ 1 cos 6 and hence cos 6 = 2/ VI 54,
or 8 = arc cos (2/ VI 54).
Consider the scalar products of the unit vectors i, j, and k. Since these are
mutually orthogonal the angle between any two is n/2. It follows from
Definition 4-13 that the scalar product of any two different unit vectors from
this triad is zero. As each of the vectors i, j, and k is parallel to itself, when
forming the scalar product of one of these vectors with itself we must set
6=0. Thus as ]i| = |j| = jk| = 1, it follows from Definition 4-13 that
i.i=j.j = k.k= 1. In summary we have these important results, which
should be memorized since they are fundamental to everything that follows :
i.i = j.j = k.k = 1,
i . k = k . i = 0,
j.k = k.j = 0.
These results are conveniently combined in Table 4-2. Each entry is to be
SEC 4-7 SCALAR AND VECTOR PRODUCTS / 149
interpreted as the scalar product of the vector at the left of the row of the
entry, with the vector at the top of the column of the entry.
Table 4-2 Table of scalar products of i, j, and k
First
member
Second member
i j k
i
j
k
1
1
1
The scalar product of two vectors may be deduced using Table 4-2 by
simple algebraic manipulation without the use of Definition 4-12. To see this
consider the vectors a = aii + fl2J + 03k and b = bii + foj + 63k. First
form their scalar product
a . b = (aii + a 2 j + a 3 k) . (bii + 62J + 63k),
and then expand the right-hand side as though ordinary algebraic quantities
were involved to obtain
a . b = (aii) . (bii) + (aii) . (b 2 ]) + (aii) . (M) + (a 2 j) . (W)
+ (fl2j) • (*2J) + («2J) . (MO + (ask) . (bii) + (ask) . (62J)
+ (ask) . (Mi).
Next, recognizing that the scalars ai, bi may be taken to the front of each
scalar product involved, rewrite the result thus :
a . b = aiM . i + aiM • j + aiM • k + a2^ij . i + 0262] • j
+ azbz\ . k + asbik. . i + a3&2k . j + a3M< . k.
Finally, using Table 4-2, this reduces to the desired result
a . b = ai&i + ctzbz + 0363.
In practice the intermediate working is always omitted and the result of a
scalar product is written on sight by retaining only the products involving
i . i, j . j, and k . k.
Example 4-24 Determine the scalar products of these pairs of vectors :
(a) a = i - 3j + k, b = -i + j - 3k;
(b) a = 2i + j - k, b = -i + j - k;
(c) a = 2i - j + 3k, b = -2i + j - 3k;
(d) a = i + 2j - k, b = i + 2j - k.
150 / COMPLEX NUMBERS AND VECTORS CH 4
Solutions To show the application of scalar products of unit vectors we shall
retain the notation i . i, j . j, and k . k in the first part of each calculation to
indicate the origin of the terms involved. The terms involving products such
as i . j, i . k, . . ., will be omitted as these scalar products are zero. The result
will usually be written down on sight without any intermediate working.
(a) a . b = (i - 3j + k) . (-i + j - 3k)
= (l)(-l)i.i + (-3)(l)j.j + (l)(-3)k.k
= -1 -3-3 = -7.
(b) a . b = (2i + j - k) . (-i + j - k)
= (2)(- l)i . i + (l)(l)j . j + (- 1)(- l)k . k
= -2+1 + 1=0.
Thus a and b are orthogonal.
(c) a . b = (2i - j + 3k) . (-2i + j - 3k)
= (2)(-2)i . i + (-l)(l)j . j + (3)(-3)k . k
= -4- 1 -9= -14.
(d) a . b = (i + 2j - k) . (i + 2j - k)
= (l)(l)i.i + (2)(2)j.j + (-l)(-l)k.k
= 1+4+1=6.
Example (d) above is a special case of the scalar product of a vector with
itself and either from Definition 4T2 or 4-13 we see that for an arbitrary
vector a,
a.a= |a| 2 . (4-40)
In words, 'the scalar product of a vector with itself is equal to the square of
the modulus of that vector'. This simple result is often valuable when finding
a unit vector parallel to a given arbitrary vector a. To see how this comes
about, if we divide a by its modulus |a| to form the vector & = a/|aj, then
result (4-40) shows that 6fc . a = 1 and so a is a unit vector.
Example 4-25 Find a unit vector a parallel to the vector a = 3i — j — 2k.
Use the result to determine the projection of the vector b = 2i + 3j + k in
the direction of a.
Solution Here |a| = -y/14 so that the desired unit vector o = a/\/14
= (3/V14)i — (l/\/14)j — (2/V14)k. Now the projection of vector b along
a is by definition the length / of vector b when projected normally onto the
line determined by a. Thus it is / = |b| cos 6, where 6 is the angle between
b and a. Since |&| = 1 we may write this as / = |b| |o| cos 6 or, by Definition
4-13, as / = b . a. Hence in this problem / = (2i + 3j + k) . a = 1/V14.
SEC 4-7 SCALAR AND VECTOR PRODUCTS / 151
It follows from the definition of a scalar product of two vectors and from
the properties of real numbers, that if a, b, and c are three arbitrary vectors,
then
a . (b + c) = a . b + a . c.
This is the distributive law for the scalar product of vectors.
Expressions of the form a . b . c, a . b . c . d, . . ., are meaningless since
the scalar product is only denned between a pair of vectors. Note also that
division by vectors is not defined, since although we may write a . b = n, it
makes no sense to write either a = nj. b or a . = n/b.
The other form of product of two vectors is the vector product. We shall
denote the vector product of vectors a and b by a x b. Again because of the
notation this is often colloquially called the cross product of two vectors.
Other notations in use for the vector product are [a, b] and a A b. In prepara-
tion for the definition of a x b we now introduce a unit vector ft that is
normal (i.e. orthogonal) to the plane defined by the vectors a and b, and
whose sense is such that a, b, and ft, in this order, form a right-handed set of
vectors. Here, although a, b, and ft are not necessarily mutually orthogonal,
we use right-handedness exactly as was defined at the start of Section 4-6.
definition 4-14 The vector product of vectors a and b will be written
a x b and is defined to be the vector quantity
a x b = |a||b| sin 6ft,
where is the angle between vectors a and b with sin 6 > 0, and ft is a unit
vector normal to the plane of a and b such that a, b, and ft, in this order, form
a right-handed set of vectors.
This shows that the vector a x b is normal to both a and b and has
magnitude |a||b| sin 6. The first interesting and unusual feature of this form
of product is that it is not commutative. If a, b, ft, in this order, form a right-
handed set for the definition of a x b, then for the definition of b x a it is
necessary to take for the right-handed set the vectors b, a, —ft, in the stated
order. The immediate consequence is the important general result that if
a and b are arbitrary vectors, then
a x b = -(b x a). (4-41)
In contrast with the scalar product, it is easily seen that the vector product
of parallel vectors is identically zero, whereas the vector product of orthogonal
vectors is non-zero. A simple calculation gives Table 4-3 of vector products
of the unit vectors i, j, and k. The left-hand column identifies the first member
of the vector product and the top row identifies the second member of the
vector product. The corresponding entry in the table gives the result of the
152 / COMPLEX NUMBERS AND VECTORS CH 4
vector product. The entries along the diagonal are all seen to be the zero or
null vector.
Table 4-3 Table of vector products of i, j, and k
First
member
Second member
i j k
i
j
k
k -j
-k i
j -i
If we take, for example, the first element in the left-hand column and the
last element in the top row, we see that i x k = — j. In many respects it is
easier to memorize these three results :
i x j = k, j x k = i, k x i = j, (4-42)
and then to use property (4-41), than to remember Table 4-3 complete. The
order of the vectors occurring in these key relations can be remembered by
making the cyclic permutations
i j k
j k i
k i j
As with scalar products, this table of vector products may be used to
calculate the vector product of any two vectors expressed in component form.
Consider the vector product a X b where a = aii + a^j + 03k and
b = Z>ii + foj + fok. Proceeding as though ordinary algebraic quantities
were involved we write
a x b = (aii + a 2 j + 03k) X (Z>ii + b 2 ] + 63k)
= (tfii) X (M) + (aii) X (6 2 j) + Oii) x (& 3 k)
+ (a 2 j) X (bii) + (a 2 j) X (6 2 j) + («2J) X (63k)
+ (ask) X (bii) + (a 3 k) x (6 2 j) + (a 3 k) X (63k),
working on the assumption that vector multiplication is distributive over
addition. Next we recognize that the scalars ai, bj may be taken out in front
of each vector product that is involved so that the expression becomes
a x b = aibii x i + ai& 2 i x j + aib^i x k + a2&ij x i + C2&2J x j
+ 0263 j X k + 0361k x i + azbdi. x j + azb&. x k.
SEC 4-7
SCALAR AND VECTOR PRODUCTS / 153
Finally, using Table 4-3 and collecting together the i, j, and k terms, we
obtain
a X b = (a 2 b 3 — a 3 b2)i + (a 3 bi — a\bi)\ + (aib 2 — a 2 6i)k. (4-43)
This is often taken as the definition of the vector product a X b in place
of our Definition 4-14. Expression (4-43) may be considerably simplified if
the concept of a determinant is used. Before showing this we must digress
slightly to define this term.
definition 415 Let a, b, c, and d be any four real numbers. Consider
the two-row by two-column array of these numbers
a b
(A)
c d.
Define the expression
a b
c d
that is associated with this array by the identity
t b
d
= (ad - cb).
(B)
(Q
We define the second-order determinant associated with the array (A) to be
the number represented in symbols by (B) and having the value defined by
(C). The process of expressing the left-hand side of (G) in the form of the
right-hand side is called expanding the determinant.
Example 4-26 Evaluate the second-order determinants
(a)
(b)
(c)
Solution The values of the determinants follow directly from the definition :
= (l)(9)-(3)(7) = 9-21 = -12;
(a)
1
7
3
9
(b)
-1
4
2
(c)
2
6
1
3
= (0)(2)-(4)(-l) = + 4 = 4;
= (2X3) - (1X6) = 6-6 = 0.
154 / COMPLEX NUMBERS AND VECTORS
CH 4
definition 4-16 Let en, b u and a with /= 1, 2, 3 be any set of nine
real numbers. Consider the three-row by three-column array of these numbers
fli #2 as
b\ bz b$
Cl C2 C3.
Define the expression
ai a% «3
b\ bi b-&
C\ C2 C3
(A)
(B)
that is associated with this array to be the single number that is determined by
the identity
fli 02 a 3
hi. bz £3
Cl C2 C3
b% bz
bi
bz
bi
hi
= ai
— Cl2
+ a 3
C2 c 3
Cl
cz
Cl
C2
(Q
We define the third-order determinant associated with the array (A) to be the
number represented in symbols by (B) and having the value defined by (C).
Example 4-27 Evaluate the third order determinant
A =
1 2
2 2
2 1
= (3)
-(-2)
+ (-7)
1 1
2 1
2 1
Solution From the definition,
3 -2 -7
2 1 2
2 1 1
Expanding the three second-order determinants and adding, we obtain the
desired result
A = 3(1 - 2) + 2(2 - 4) - 7(2 - 2) 7.
It is helpful to classify determinants in some simple way, which the
next definition achieves.
definition 417 We define the order of a determinant to be the number
of terms that lie on a diagonal drawn from the top left-hand corner to the
SEC 4-7
SCALAR AND VECTOR PRODUCTS / 155
bottom right-hand corner. The values of these terms are immaterial.
Thus in Example 4-26 the determinants are second-order, whereas in
Example 4-27 the determinant is third-order, and is evaluated in terms of
three second-order determinants.
We are now able to give the promised alternative definition of a vector
product.
alternative definition 4-18 We define the vector product a X b of
the two vectors a = aii + a%\ + 03k and b = &ii + 62J + fok to be the
formal expansion of the determinant
a x b
i
J
k
fli
ai
03
by
b 2
b 3
In this definition we have used the word 'formal' because, although the at
and bi are real numbers, the i, j, and k are unit vectors. Aside from this the
expansion of the third-order determinant is performed exactly as in Example
4-27.
Example 4-28 Determine the vector product a x b where a = i + j — 2k
and b = -2i + 3j + k.
Solution To apply Definition 4-18 we first notice that the components
fli, at, and 03 of a are 1, 1, and —2 whilst the components b±, b%, and 63 of b
are —2, 3, and 1. Hence
i j k
a x b =
1 1 -2
-2 3 1
and so
a x b =
= 7i + 3j + 5k.
= 1
-J
1
-2
+ k
1 1
-2
1
-2 3
This effectively demonstrates that for most practical purposes Definition
4-18 involves the least manipulation.
It is easily proved that the vector product is distributive, so that for any
three vectors a, b, and c we always have
ax(b + c) = axb + axc.
Indeed this is implied by the way in which Eqn (4-43) was derived.
With the introduction of the vector product, mixed products of the form
a . (b x c) become possible. This type of product is known as a triple scalar
156 / COMPLEX NUMBERS AND VECTORS
CH 4
product and as it involves the scalar product of a with (b x c) it is seen to be
a scalar. If a = a x i + a 2 \ + a 3 k, b = bii + b 2 \ + b 3 k, and c = cii + c 2 \
+ c 3 k then by combination of Definitions 4-12 and 4-18 we have
a . (b x c) = (flii + at\ + a 3 k) .
i
bi
Cl
j
k
b 2
b 3
C 2
c 3
or,
a . (b X c) = ai(b 2 c 3 — c 2 b 3 ) — a 2 (bic 3 — cib 3 ) + ^3(^1^2 — cib 2 ).
The terms on the right-hand side of this expression are the result of expanding
(C) in Definition 4-16, so that they may be re-combined into a determinant to
give the general result
a . (b x c) =
fll
02
a 3
bi
b 2
b 3
Cl
c 2
c 3
(4-44)
By interchanging rows of the determinant it is readily shown that the
dot . and the cross x in a triple scalar product may be interchanged so that
a . (b x c) = (a x b) . c.
(4-45)
Example 4-29 Evaluate the triple scalar product a . (b x c) given that
a = 2i + k, b = i + j + 2k, and c = — i + j.
Solution The components of a, b, and c are, respectively, (2, 0, 1), (1, 1, 2),
and (—1, 1,0). Hence
a . (b x c)
2
1
1
1
2
-1
1
= 2 . (-2) - . (2) + 1 . (2) 2.
As our next generalization, we notice that vector products of more than
two vectors are defined provided the order in which these products are to be
carried out is specified by bracketing. As a special case we have the triple
vector product a x (b X c) of the three vectors a, b, and c which differs
from the triple vector product (a x b) x c. The first expression signifies the
vector product of a and (b x c), whilst the second signifies the vector product
of (a X b) and c, and in general these are different vectors.
A straightforward application of Definition 4T8 establishes the following
useful identity from which some interesting results may be derived
a x (b x c) = (a . c)b — (a . b)c.
(4-46)
SEC 4 . 8 GEOMETRICAL APPLICATIONS / 157
The details of the proof are left to the reader.
Example 4-30 Demonstrate the difference between the triple vector products
a X (b X c) and (a X b) x c by making the identifications a = i, b = i + j,
c = k.
Solution By direct substitution we find that a x (b x c) = i x f(i + j) X k]
and so expanding this result by using Eqn (4-42) gives ax (b X c)
= i x [— j + i] = — k. Similarly, in the second case, (a x b) x c =
[i x (i + j)] x k = k x k = 0.
4-8 Geometrical applications
This section illustrates something of the application of vectors to elementary
geometry, and gives some simple but useful results. First we consider the
representation of a straight line in vector form, and then show how the single
vector equation may be reduced to the more familiar set of three Cartesian
equations.
The straight line
Consider the problem of determining the equation of a straight line given that
it passes through the point A with position vector a relative to O, and is
parallel to vector b. We shall denote the position vector of a general point P
on the line by r as shown in Fig. 4-13.
Fig. 4.13 Straight line through A parallel to b.
By the rules of vector addition we have
OP = OA + AP
or,
r = a + AP.
However, as the straight line through A is parallel to the free vector b,
158 / COMPLEX NUMBERS AND VECTORS CH 4
it follows that for any point P on the line there is a scalar A such that we can
write AP = Ab. Applying this result to the equation above we see that the
vector equation for the straight line becomes
r = a + Ab. (4.47)
The scalar A in this equation is simply a parameter, and different values of
A will determine different points on the line. To express this result in Cartesian
form, set r = xi + y\ + zk, a = aii + a 2 j + 03k and b = bii + b 2 j + b 3 k,
when Eqn (4-47) reduces to
xi + y\ + zk = fl i J + fl 2J + a 3 k + A(Z>ii + b 2 \ + b 3 k).
This vector equation implies three scalar equations by virtue of the equality
of its i, j, and k components. Hence we arrive at the three scalar equations
x = ai + Xbi (i-component)
y = a 2 + A62 (j-component)
z = a 3 + A63 (k-component).
If these are each solved for A and equated, we obtain the more familiar result
x — a\ y — ai z — a 3
- y -^— = -^ = *- (4-48)
bi bi b
Equations (4-48) are the standard Cartesian form for the equations of a
straight line. Notice that the coefficients of x, y, and z in Eqn (4-48) are all
unity; that b\, b 2 , and b 3 are then the direction ratios of b and a\, a 2 , and a 3
define a point on the line. Equations (4-48) are sometimes expressed in the
form of three simultaneous equations relating x and y, x and z, and y and z.
This follows by cross-multiplying different pairs of expressions in Eqn (4-48).
Example 4-31 Find the vector equation of the line through the point with
position vector i + 3j — k which is parallel to the vector 2i + 3j + 4k.
Determine the point on the line corresponding to A = 2 in the resulting
equation. Also express the vector equation of the line in standard Cartesian
form.
Solution From Eqn (4-47) we have
r = (i + 3j - k) + A(2i + 3j + 4k)
or,
r = (1 + 2A)i + 3(1 + A)j + (4A - l)k.
This is the vector equation of the line, and setting A = 2 determines the
point r = 5i + 9j + 7k. To express the equation of the line in Cartesian
form we appeal to Eqns (4-48) and use the fact that a = i + 3 j — k and
SEC 4-8 GEOMETRICAL APPLICATIONS / 159
b = 2i + 3j + 4k. Hence a\ = 1, a 2 = 3, az = —1, and b\ = 2, b% = 3,
and bz = 4, so that the desired Cartesian equations are
x — 1 J — 3 z+1
As a check we can also use these equations to determine the point corres-
ponding to X = 2. We must solve the three equations
2 ' 3 ' 4 '
which give x = 5, y = 9, and z = 7. These are of course the coordinates of
the tip of the position vector r = 5i + 9j + 7k which confirms our previous
result.
The same approach may be used if the line is required to pass through the
two points A and B with position vectors a and (3, respectively. For then the
line passes through a and is parallel to the vector [3 — a which is just a seg-
ment of the line itself. Hence we identify a with a and b with (3 — a, after
which the argument proceeds as before.
In the next example we illustrate how the non-standard Cartesian equa-
tions of a straight line may be re-interpreted in vector form.
Example 4-32 The equations
2x-l_j> + 2_-z + 4
3 ~ 3 ~ 2
determine a straight line. Express them in vector form and find the direction
ratios of the line.
Solution To express the equations in standard Cartesian form we must
first make the coefficients of x, y, and z each equal to unity. Hence we rewrite
the equations:
x — | y + 2 z — 4
~m = ~~3~ = F2) '
The vector a then has components «i = J, «2 = — 2, a 3 = 4 and the vector
b has the components bi = 3/2, b 2 = 3, b 3 = —2. These latter three numbers
are the desired direction ratios. The vector equation of the straight line itself is
r = 1(1 + 3A)i + (31 - 2)j + 2(2 - A)k.
(Why?)
On occasion it is necessary to determine the perpendicular distance p
from a point C with position vector c to the line L with equation r = a + Ab.
160 / COMPLEX NUMBERS AND VECTORS
CH 4
Fig. 4-14 Perpendicular distance of point from line.
This can be done by applying Pythagoras' theorem in Fig. 414.
We have the obvious result
n 2 —
(AC) 2 - (AB) 2
but AC = c - a so that (AC) 2 = |AC| 2 = (c - a) . (c - a), whilst length
AB is the projection of AC onto the line L. Now the unit vector along L is
b/|b| so that AB = (c -~a) . b/|b| and thus
((c-a).b\ 2
(AB) 2 =
/ (c - a) . b y
I ibi ;
Combining these results gives
i 2 = (c —
(c - a) . (c
( (e - a) . b \
(4-49)
from which p may be deduced.
Example 4-33 Find the distance of the point with position vector i + j + k
from the line r = (i + 2j + k) + A(i - 2j + k).
Solution In the notation leading to Eqn (4-49) we have a = i + 2j + k,
b = i — 2j + k, and c = i + j + k. Hence c — a = — j and thus (c — a)
. (c - a) = (-j) . (-j) = 1. Also (c - a) . b j . (i — 2j + k) = 2 so
that ((c - a) . b) 2 = 4, whilst |b| 2 = 6. Hence
/(c- a).b\ 2 _4_
I |bi J ~6~
SEC 4-8 GEOMETRICAL APPLICATIONS / 161
Fig. 415 Vector equation of a plane n . r = \n\p.
and so from Eqn (4-49), p 2 = 1 - § = i r p = 1/^3 as p is essentially
positive.
The plane
The equation of a plane is easily determined once it is recognized that a
plane II is specified when one point on it is known, together with any vector
perpendicular to it. Such a vector, when normalized, is a unit-normal to the
plane II and is unique except for its sign. The ambiguity as to the sign of the
normal is, of course, because a plane has no preferred side. To derive its
equation consider Fig. 4-15.
Let r be the position vector relative to O of a point P on the plane II, and
n be a vector normal to the plane directed through the plane away from O
so that the corresponding unit normal is n = n/|n|. Further, let the perpendi-
cular distance ON from the origin O to the plane be p. Then for all points P
we have (OP) cos 6 = p. In terms of vectors this is
r . n
7nT =/7 ' (4-50)
which is just the vector equation of a plane. If the number p in Eqn (4-50) is
positive then the plane lies on the side of the origin towards which n is directed,
otherwise it lies on the opposite side.
To express result (4-50) in Cartesian form let r = xi + jj + zk and the
unit normal n = n/|n| = /i + m \ + nk, where of course I 2 + m 2 + n 2 = 1.
Equation (4-50) becomes
lx + my + nz=p. ( 4 . 51 )
162 / COMPLEX NUMBERS AND VECTORS CH 4
This is the standard Cartesian form of the equation of a plane. Any equation
of this form represents a plane having for its unit normal the vector /i + m\
+ «k and lying at a perpendicular distance p from the origin. If p = the
plane passes through the origin.
Example 4-34 Find the Cartesian equation of the plane containing the point
(1, 2, 3) which is normal to the vector i + 2j + 2k.
Solution First we use Eqn (4-50) to determine p. Since the point (1, 2, 3)
lies in the plane, r = i + 2j + 3k is the position vector of a point in the plane.
The vector normal to the plane in this case is n = i + 2j + 2k, so that
|n| = 3 and the unit normal ft = n/[n| = (i + 2j + 2k)/3. This shows that
/ = J, m = f , n = f . Hence, substituting into Eqn (4-50),
(i + 2j + 3k) . (i + 2j + 2k)
or/? = 11/3. As p > 0, the plane must lie on the side of the origin towards
which n is directed. Substituting in Eqn (4-51) we find the desired Cartesian
form of the equation of the plane :
3* + 3 y + 3 Z == ~3~-
This equation could equally well be written in the non-standard Cartesian
form x + 2y + 2z = 11, though then the constant on the right-hand side is
no longer the perpendicular distance of the plane from the origin.
Simple geometrical considerations similar to those set out above, when
coupled with the scalar and vector product, enable various useful results to
be derived very quickly. For example, as the angle 6 between two planes is
defined to be the angle between their unit normals fii and &2 it follows that
d may be obtained from the scalar product Ai . n2 = cos d. Also the line of
intersection of these two planes is perpendicular to both normals hi and n 2
and so is parallel to the vector t determined by the vector product t = fii x n2.
Rather than elaborate on these ideas here, a number of problems are given
at the end of the chapter.
The sphere
Consider a sphere of radius R with its centre at the point A with the position
vector a. Then if r is the position vector of any point on the surface of the
sphere, the modulus of the vector r — a must equal R. In terms of vectors the
equation of the sphere is
|r - a| = R
or, alternatively,
SEC 4-9 APPLICATIONS TO MECHANICS / 163
(r - a) . (r - a) = R 2 . (4-52)
If, now, we expand this equation to get
r . r - 2r . a = i?2 - a . a,
and then set r = xi + y\ + zk, a = aii + a 2 j + a 3 k and R 2 - a . a = q,
we obtain the standard Cartesian form of the equation of a sphere
x 2 + y 2 + z 2 - 2aix - 2a 2 y - 2a 3 z = q. (4-53)
Example 4-35 Find the Cartesian form of equation of the sphere of radius 2
having its centre at a = i + j + 2k.
Solution As r = xi + y\ + zk and a = i + j + 2k we have r — a
= (x- l)i + {y- l)j + (z - 2)k, whilst R = 2. Hence Eqn (4-52) becomes
(x - 1)2 + (y - 1)2 + ( z _ 2)2 = 4,
which is the desired Cartesian form of the equation.
4-9 Applications to mechanics
This section briefly introduces some of the many situations in mechanics
that are best described vectorially. First is one of the simplest applications
of vectors, that will already be familiar to the reader.
Polygon of forces — resultant
It is known from experiment that when forces Fi, F 2 , . . ., F„ act on a rigid
body through a single point O, their combined effect is equivalent to that of a
single force R, their resultant, which acts through the same point O and is
equal to their vector sum. Such a system of forces acting through a single
point is a concurrent system of forces. Thus we have
R = F 2 + F 2 + ■ • • + F n . (4-54)
These forces are often represented in the form of a vector polygon of
forces as shown in Fig. 4-16, in which the senses of the forces F< are all simi-
larly directed and are opposite to the sense of R.
Conversely, the vector polygon shows that the vector -R is the additional
force that is required to act through O in order to maintain the system oJ
forces in equilibrium.
Example4-36 Forces Fi,F 2 , and F 3 have magnitudes 3^/3, s/\ 4, and 2^6 lb
and act concurrently through a point O along the lines of the vector
i + j + k, 3i - j + 2k, and -i + 2j + k, respectively. Find force Q tha
must act through O for the system to remain in equilibrium.
164 / COMPLEX NUMBERS AND VECTORS
CH 4
Fig. 416 Vector polygon.
Solution This is a direct application of the last remark about the vector
polygon of forces, and the only problem is one of scaling. Let us agree that a
vector of unit modulus represents a force of 1 lb. From the conditions of the
question we see that Fi, F2, and F3 are respectively directed along the unit
vectors
fi = -= (i + j + k),
1
V14
1
(3i - j + 2k),
V6
(-i + 2j + k).
Using the scale factor we can use these to write
Fi = 3V3fi = 3i + 3j + 3k,
F 2 = V14f 2 = 3i - j + 2k,
F 3 = 2V6f 3 = -2i + 4j + 2k.
Hence the resultant R = Fi + F2 + F 3 = 4i + 6j + 7k. The force necessary
for equilibrium is Q = — R showing that Q = — 4i — 6j — 7k.
As |Q| = V101> it follows immediately that the desired force is yT01 lbs
and acts in the direction of the unit vector q, where
■1
Vioi
(4i + 6j + 7k).
In many problems of statics the centroid or the centre of mass of a system
of particles is of importance. We now define this concept in terms of vectors.
SE C 4-9 APPLICATIONS TO MECHANICS / 165
definition 4-19 The centre of mass of the system of masses m\, mo, ■ . .,
m n whose position vectors are ai, a2, . . ., a„ is at the point G. where G
has the position vector g determined by
wiai + mono + • • • + /««a„
g = -.
nix + mo + ■ ■ • + m„
Next we discuss simple problems about relative motions, and relative
velocity.
Relative velocity
Problems involving the motion of one point relative to another, which is
itself moving, occur frequently in mechanics and easily lend themselves to
vector treatment. They are best illustrated by example but first we define
relative velocity.
definition 4-20 The relative velocity of a point P with velocity u, relative
to the point Q with velocity v, is defined to be the velocity u — v.
Example 4-37 A man walks due east at 4 mile/h and his dog runs north-
east at 12 mile/h. Find the velocity and speed of the man relative to his dog.
Solution Let a unit vector denote a velocity of magnitude 1 mile/h and take
j pointing due north and i pointing due east.
Unit vectors in the directions of motion of the man and dog are then i and
+ })IV 2 - The velocity u of the man is thus u = 4i and the velocity v of the
dog is v = 6V2(i + j). Hence the velocity of the man relative to his dog is
u - v = 2(2 - 3^2)1 - 6V2j.
His relative speed is |u - v| = (160 - 48V2) 1/2 mile/h.
Work done by a force
The scalar product can be used to give a convenient representation of the
work W done by a force F that produces a displacement d of the particle on
which it acts. The work done by a force of magnitude |F| when it displaces a
particle through a distance ]dj is defined as the product of the distance
moved and the component of force in the direction of the displacement.
Hence, as W is positive we have
W= |F||d||cos0|,
where 6 is the angle of inclination between F and d. So the final result is:
W=|F.d|.. (4 .55)
Example 4-38 Calculate the work W done by a force F of 12 lbs whose line
166 / COMPLEX NUMBERS AND VECTORS CH 4
of action is parallel to 2i + 3j — 2k when it moves its point of application
through a displacement d of 4 ft in a direction parallel to — 2i + j — 3k.
Solution The unit vectors parallel to the force F and displacement d are
f = (2i + 3j - 2k)/v 17 and d = (-2i + j - 3k)/ v 14, respectively. Let f
denote a force of 1 lb and d a displacement of I ft so that F = 12f
= (24i + 36j - 24k)/ v 17 and d = 4d = (-8i + 4j - 12k)/y 14. Then
the work W that is done is
W = |F.d| ft lbs
= (24)(-8) + (36)(4) + (-24K- 12) = 240 ft lbs.
We now turn to applications of the vector product. One of the easiest
occurs in the determination of the angular velocity of a point rotating about a
fixed axis.
Angular velocity
Consider a rigid body rotating with a constant spin SI rad/s about a fixed
axis L. Fig. 4- 1 7 represents a point P in such a body, having the position vector
d relative to a point O on the spin axis L. Point Q is the foot of the perpendi-
cular from P to the line L.
The vector SI parallel to L with magnitude U. and sense determined by a
right-hand screw rule with respect to L and the direction of the spin O is
called the angular velocity of the body. The instantaneous linear velocity v
of point P with position vector d is obviously Q. . (QP) in a direction tangent
to the dotted circle in Fig. 4T7. It is easily seen that we may rewrite this as
|v| = |£2||d| sin©
or as
v = SI x d. (4-56)
The final two applications of the vector product involve the concept of the
moment of a vector which is first defined and they require the use of a bound
vector.
definition 4-21 We define M = d x Q to be the moment of vector Q
about the point O, where d is the position vector relative to O of any point on
the line of action of the bound vector Q.
This definition is illustrated in Fig. 4T8 in which the plane IT contains the
vectors d and Q and, by virtue of the definition of the moment, M is normal
toll.
The natural mechanical applications of this definition are to the moment
of a force and to the moment of momentum about a fixed point. In both
PROBLEMS / 167
M = dxQ
Fig. 4-17 Angular velocity.
Fig. 418 Moment of a vector about O.
situations the line of action of the vector whose moment is to be found is
important, as is its point of application in some circumstances.
If Q is identified with a force F, then the expression
M = d X F
(4-57)
is the moment or torque of the force F about O. If the force is expressed in
lb and the displacement vector in ft, the units of torque are lb-ft. Similarly,
if Q is identified with the momentum mv of a particle of mass m moving with
velocity v, then the vector
M = d x (mv)
= m& X v
(4-58)
is the moment of momentum or the angular momentum of the particle about O.
PROBLEMS
Section 4-1
4-1 Give a graphical representation of each of the following velocities by drawing
directed line elements. In each case indicate the sense of the vector with an
(a) 4 ft/s in a north-east direction;
(b) 2-5 ft/s in a south-west direction;
(c) 5 ft/s due west.
What velocities would these same directed line elements represent if the
arrows were reversed ?
4-2 Classify each of these quantities as scalar or vector:
(a) volume; (b) length; (c) momentum = mass x velocity; (d) electric
field; (e) speed; (f) acceleration; (g) density; (h) chemical concentration;
(i) electrostatic capacity; (j) moment of a force.
168 / COMPLEX NUMBERS AND VECTORS CH 4
4-3 Find the roots of each equation :
(a) x 2 = -36; (b) x 2 = -27; (c) x 2 = 25; (d) x 2 = -2.
4-4 Find the roots of these quadratic equations:
(a) x 2 + 3x + 3 = 0; (b) x 2 - 3x + 2 = 0; (c) x 2 + 4x + 5 = 0.
4-5 By setting x 2 = w, reduce the following quartic equations to quadratic
equations, and hence obtain their roots:
(a) x* + x 2 - 2 = 0; (b) x 4 + 5x 2 + 6 = 0; (c) x 4 - 5x 2 + 6 = 0.
4-6 Find the real and imaginary parts of each of these complex numbers:
(a)z = 9-6/; (b) z = 32; (c) z = 14 + 2/; (d) z = 17/; (e) z =
-3 + /.
4-7 Write the following numbers in real-imaginary form given that their real
and imaginary parts are:
(a) Rez = -11, Imz = 1; (b) Re z = 0, Im z = -3;
(c) Re z = 0, Im z = 0; (d) Re z = 4, Im z = 17.
Section 4-2
4-8 Which of these complex numbers are equal ?
zi = 2 — /, z 2 = 1 — /, z 3 = 4 + ;', z 4 = 1 — ;', Z5 = 2 + /, Z6 = 2 — /,
Z7 = 1 — /'.
4-9 Given that the following complex numbers are equal, deduce the values of
a and b :
(a) 2 - 3/ = 2 + ib; (b) a + 4/ = 1 + ib;
(c) 3 + 7/ = a + ib; (d) 5 + ia = b + 6/.
4- 10 Use Definitions 4-2 and 4-3 together with the real number axioms to prove
that (a) zi + Z2 = Z2 + z\ thereby showing that complex addition is com-
mutative and, (b) z\ — Z2 = — (z2 — zi).
411 Form the sums z\ + zz given that:
(a) zi = 3 — /', Z2 = 4 + li\
(b) zi = -2 - 4/, z 2 = 2 + 3/';
(c) zi = 5 + 6/', Z2 = —5 — 6/';
(d) zi = 4 - 3/, z 2 = 2 + 3/.
4-12 Form the differences z\ — Z2 given that:
(a) zi = 2 + 6/, z 2 = 4 + 2i;
(b) zi = -2 + /, z 2 = -2 + 2/;
(c) zi = 4 + li, z 2 = 2 + 7/;
(d) z\ = 3/, z 2 = 1 + 3/.
4- 13 Form the products zizo given that:
(a) zi = 1 + /, z 2 = 2 + 3/;
(b) zi = 3 - 5;, z 2 = 3 + 5/';
(c) 2i = /', Z2 = 4 — 3/;
(d) 2i = 2, z 2 = 9 - i.
4-14 Evaluate (1 + 5 - (1 - if-
1 1
4-15 Evaluate
(1 + i)« T (1 - ;) 4
PROBLEMS / 169
4- 16 Solve these equations for z:
(a) 3z + (9 + 6/) = 7 + 3/; (b) 2z + (3 - 2i) = 3 - 2»;
(c) 4z - (4 + 6/) = -3 + i; (d) 3z + (2 + /) - 3(1 + 2/) = 1 + /.
4-17 Form the quotients zi/z2 given that:
(a) zi = 3 + 2/, z 2 = 1 - /; (b) zi = 9 + 3/, z 2 = 3 + /;
(c) zi = 8 + 4/', z 2 = 2 - 4;'.
4-18 Solve these equations for z:
(a) 2z(3 + = 2 + 3/; (b) 3z(l - 2/) = 1 + 4/;
(c) 4z(l -0 = l + j; (d) 2z(4 + 0=1+ 4».
419 Use Definition 4-4 and the real number axioms to prove that zizo = z«z\
thereby showing that complex multiplication is commutative.
4-20 Use Definition 4-5 to prove that:
<a)*-<7>; (b) (i)=#
(c) £5) = (z)3; (d) (~ j = z lf2 .
4-21 Use the real number axioms together with Definition 4-5 to prove that:
(Zl + Z2 + • • ■ + Zn) = Zl + Z 2 + • • • + Z„.
4-22 State which of the following polynomials have at least one real root and
which, if they have complex roots, will have them occur in complex conjugate
pairs. If no deductions can be made about the nature of the roots, then say so.
(a) P(z) = z 5 + 16z 4 + z 2 + 3z + 1 ;
(b) P(z) = z 4 + 3z 3 + 2z 2 + 1 ;
(c) P(z) = z 7 + 5z 5 - 2z 2 + z + /;
(d) P(z) = z 3 - 6z 2 + 2z + 4.
4-23 Given that z = 2 + 3/ is a root of the polynomial
P(z) = z 4 - 4z 3 + 12z 2 + 4z - 13,
deduce the values of the other three roots. Factorize P{z) into linear and
quadratic factors with real coefficients.
4-24 Given that z = i is a root of the polynomial
P(z) s- z 5 - 2z 4 + 10z 3 - 20z 2 + 9z - 18,
deduce the values of the other four roots. Factorize P(z) into linear and
quadratic factors with real coefficients.
Section 4-3
4-25 Plot the following vectors z x and z 2 in the complex-plane and use geometrical
methods to form their sum zi + z 2 and their difference zi - z 2 :
(a) zi = 2 + 3/, z 2 = -1 + 2/; (b) zi = 3, z 2 = 4 - /•
(c) zi = 4/, z 2 = 3 - 4/; (d) zi = -1 - 2/, z 2 = -1 + 2/.
4-26 Find the modulus of each of these vectors:
(a) 4 - 3/; (b) -2 + 3/; (c) 2 - 3»; (d) 3 + 4/; (e) 5i.
170 / COMPLEX NUMBERS AND VECTORS CH 4
4-27 Use Definitions 4-5 and 4-7 to prove that:
(a) zz = | z I 2 ; (b) ( zi z 2 1 = | zi | . | z 2 1 ;
and give an inductive proof that
| Zl Z2 ■ ■ ■ Z„ | = | Zl | . | ZZ | • • • | Z„ | .
4-28 Given that zi = 3 + 4/, Z2 = 4 — 3i, Z3 = 2 + /, and Z4 = V3 + /, use the
results of the previous problem to compute | z\ z% |, | zi Z2 Z3 |, and
| zi Z2 zz Zi | . Check your results by direct computation and compare the
relative labour of computation.
4-29 Use the properties of the complex conjugate operation to prove that for any
two complex numbers zi and Z2,
zi Z2 + zi Z2 = 2 Re zi £2.
Then, using this result together with the obvious inequality
| Re zi Z2 | < | zi z 2 |
and the identity
| Zl + Z2 | 2 = (Zl + Z2)(Z1 + Zz),
prove the triangle inequality,
| Zl + Z2 | < | Zl | + | Z2 |.
4-30 Use the same form of argument as in Problem 4-29 together with the obvious
inequality Re zi Z2 > — | zi Z2 | to prove
||zi| - |z 2 || <|zi + z 2 |.
4-31 Give two examples in which the triangle inequality is strict (that is, the sign
< is replaced by <). Give two further examples in which it reduces to an
equality.
4-32 Give two examples in which the inequality 1 1 zi | — | Z2 1 1 < | zi + Z2 | is
strict. Give two further examples in which it reduces to an equality.
Section 4-4
4-33 Express these numbers in modulus-argument form:
(a) z = -3 + 4/; (b) z = -3 - 4;;
(c) z = -3 + 3/; (d) z = 2V3 - 2i.
4-34 Express the following numbers z in real-imaginary form given that :
(a) | z | = 4, arg z = |; (b) | z | = 2, arg z = ^;
(c) | z | = 6, arg z = y ; (d) | z | = 3, arg z = ~
4-35 Use the modulus-argument representation of complex numbers to prove :
(a) Zl Z2 • • • Zn = Zl . Z2 • • • Zn\ (b) (z") = (f)";
(c)
and arg |-| = —arg;
PROBLEMS / 171
4-36 Given the following numbers z in real-imaginary form, compute the products
iz. Plot the results in the complex-plane and verify that the effect of multi-
plication by / is to rotate a vector anti-clockwise through an angle I * without
change of size:
z = 3 — 2/; z = —2 + i; z = /; z = — 1 — /.
4-37 Form the products z\ zo and the quotients z\jzo of the following numbers
expressed in modulus-argument form:
(a) zi = 3(cos ,'jjr + /sin \tt)\ z 2 = |(cos }» + /sin \tt);
(b) z\ — 4(cos \-n — /sin \-t)\ zi = 2(cos i^ + /sin Jtt);
(c) zi = iHcos iw — / sin iw); z-i = 6(cos 3w/2 — / sin 3-n-/2).
438 The second-order difference equation
ailn + bun-l + CUn-2 =
has for its general solution the expression
u ,, = Ah" + Bh"
whenever the characteristic equation
aX l + bl + c =
has the distinct real roots h and A2. If b 2 — Aac < 0, so that the character-
istic equation has the complex conjugate roots A and A, show that if u n is to
be real, then the constants A and B must also be. complex conjugates. Hence
show that if/) 2 — Aac < and | I \ — r , arg I = 0, then the general solution
is expressible in the form
Un = r"(Ccos nO + D sin nO),
where C and D are real arbitrary constants.
Find the general solution of the following difference equation, and hence
determine the particular solution appropriate to the stated initial conditions:
ti„ — 3\ 2u„-\ + 9«jj-2 = with un = 1, wi = 3.
Section 4-5
4-39 Use de Moivre's theorem to express sin 16 and cos 16 in terms of powers of
sin and cos 0.
4-40 Use de Moivre"s theorem to express sin 1 1 and cos 1 1 in terms of powers
of sin 6 and cos 6.
4-41 Evaluate z 20 when z = -\ '3 + /.
4-42 Evaluate z u when z = I — i \ 3.
4-43 Calculate the seventh roots of unity.
444 Find the roots of the equation iv = (— /) 2 < 3 .
4-45 Find the roots of the equation w = (1 + i\ 3) ,/4 .
Section 4-6
4-46 Construct the set of cyclic permutations of the four letters a, b, c, and d.
4-47 Construct a table analogous to Table 41 for a left-handed system of axes.
4-48 Determine the lengths j OP | of the vectors OP given that O is the origin
and the points P are:
172 / COMPLEX NUMBERS AND VECTORS CH 4
(a) (1,1,1); (b) (-2,1,3); (c) (-1, -1, -1); (d) (3, -2, -4).
449 Find the lengths | OP |, the direction cosines and the angles 0i, 02, 03 of the
vectors OP, where the points P are:
(a) (2, - 1, - 1); (b) (4, 0, 2); (c) (-1,2, 1).
4-50 Find the direction ratios, the direction cosines and the angles 0i, 2 , 63 of
the vectors OP, where the points P are:
(a) (1, 1, l)T(b) (-1,1,1); (c) (2,1, -1).
4-51 Determine the angles 0i, 02, 03 for the vectors with the direction cosines:
452 Given that a vector makes acute angles with each of the coordinate axes and
that its direction cosines are Im, m, —pi\, deduce the value of m and hence
find the angles. \ ^ 1
4-53 Use the fact that a vector makes an acute angle with each of the coordinate
axes and that its direction ratios are 1, 2, 2 to determine the angles 0i, 02,
and 03 that it makes with the coordinate axes.
454 Determine the lengths | AB | of the vectors AB, given that the end points
A and B are:
(a) A = (1, 1, 1), B = (2,0, 1);
(b) A = (2, -1,1), B = (-2, 2,2);
(c) A = (-1,3, 1), B = (-2, -1,0).
Use your results to determine the direction cosines for each of these vectors.
4-55 Write down the position vectors OP in terms of the unit vectors i, j, k given
that O is the origin and the points P are:
(a) (1,1,1); (b) (-2,3,7); (c) (3, -1, 1 1); (d) (0,1,0).
4-56 Write down the x, y, and z-components of these vectors:
(a)3i-2j + k; (b) -i + 3j + Ilk; (c) i - k; (d) j + 3k.
4-57 Form the vector a = ai + /?j + 7k, given that
(1 - a)i + 2/5j + (2y - l)k = 2i + j + 3k.
4-58 Determine the values of ot, ft, and y in order that:
(1 - a)i + W ~ * 2 )j + (3' ~ 2)k = li + 3j + 2k.
459 Form the sum a + b and difference a — b of the vectors:
(a) a = 3i - 2j + k, b = -i - 2j + 3k;
(b) a = -i + 2j - k, b = 2i - 4j + 2k;
(c) a = 2j - 3k, b = 2i - j + k.
4-60 Prove from the definitions of addition and subtraction of vectors that for
any vectors a and b
(a) a + b = b + a and (b) a - b = -(b - a).
4-61 Find |AB| and the direction cosines of the vectors AB given that A and B
are the points:
PROBLEMS / 173
(a) A = (1, 1, 1), B = (2, -1, 1);
(b) A = (2,0,-1), B = (1,2,1);
(c) A = (1, 1, 1), B = (-1,-1,-1).
4-62 State which of the following pairs of vectors a and b are parallel and which
are anti-parallel :
(a) a = i - 3j + k, b = -4i + 12j - 4k;
(b) a = -2i + 3j - k, b = 2i - 3j + k;
(c) a = 4i - j - 3k, b = 8i - 2j - 6k;
(d) a = i + 7j + k, b = 3i + 21j + 3k.
Section 4-7
4-63 Express the following vectors a as the product of a scalar and a unit vector:
(a) a = 2i - j + 3k; (b) a = 3i - 3j + k; (c) a = -^-i + -j - ^k.
4-64 Find the vectors AB, and their direction cosines given that A and B have
position vectors a and b, respectively, where
(a) a = 3i - 3j + 5k, b = i + 2j - k;
(b) a = 2i + 2j + k, b = i + 3j + 2k.
4-65 Verify the inequalities 1 1 a | - | b 1 1 < | a + b | < | a | + | b | for the pairs
of vectors :
(a) a = i - 2j - k, b = 2i - 3j + k;
(b) a = 3i - 4j + k, b = 6i - 8j + 3k;
(c) a = 2i + 3j - k, b = -6i - 9j + 3k.
4-66 Find the angle between the vectors a and b where:
(a) a = i + j + k, b = 2i + j - k;
(b) a = -i + 2j + 2k, b = 2i — j — 2k.
4-67 Give two examples of pairs of vectors that are orthogonal but are not parallel
to the vectors i, j, or k.
4-68 Give two different proofs of the fact that scalar multiplication of vectors is
commutative by using the two alternative definitions of the scalar product.
4-69 Find the scalar products a . b and hence find the angle between the vectors
a and b given that :
(a) a = 7i - 3j + k, b = -i + 2j + 2k;
(b) a = 2i - 2j + k, b = — 3i — 3j + 4k;
(c) a = i + 2j + 3k, b = -2i - 4j - 6k.
4-70 Find unit vectors parallel to the vectors a where :
(a) a = 2i - 2j + k; (b) a = -3i + j + 2k; (c) a = 7i - 2j - 3k.
4-71 Prove the distributive law for the scalar product by using either definition of
the scalar product.
4'72 Form the vector products a x b if:
(a) a = i - 2j - 4k, b = 2i - 2j + 3k;
(b) a = -i + 4j - k, b = 3i + 2j + 4k;
(c) a = -2i + 4k, b = 3j - 2k.
174 / COMPLEX NUMBERS AND VECTORS
CH 4
4-73 Evaluate the determinants:
(a)
2 1
(b)
4 16
(c)
2
(d)
3 9
4 6
'
-2 6
'
16
*
1 3
4-74 For what values of A, if any, do these determinants vanish:
(a)
4-75 Evaluate the determinants:
A 2
(b)
A 2
(0
3 A
(d)
3 1
3 2A
'
2
'
(a)
2 1 1
1 2 1
1 1 1
(b)
3 4 5
(c)
2 2 1
»
1 2
3 4 5
3 1 2
6 5 7
3A 4
2 -A
4-76 For what values of A do the following determinants vanish :
(a)
A 1 2
(b)
1 A 1
;
2 2 1
A
1
ic)
2A
1
;
1
3
1
2
A
1
1
A
1
4-77 Use Definition 418 to prove that a x b = — (b x a) for arbitrary vectors
a and b.
4-78 Evaluate the vector products b x a given that :
(a) a = 2i - j + 2k, b = -3i + 2j + k;
(b) a = -i + j + k, b = 4i + 2j + 3k;
(c) a = -i - j - k, b = 2i + 2j + 2k.
479 Determine unit vectors that are normal to both vectors a and b when :
(a) a = 3i + 5j - 2k, b = i + j + k;
(b) a = -4i + 2k, b = j - 3k.
State whether the results are unique and, if not, in what way are they in-
determinate.
4-80 Use the definition of a vector product to prove that it is distributive and so
ax(b + c) = axb + axc.
4-81 Use Definition 418 of a vector product to prove that when a and b are non-
zero vectors, then a x b = if, and only if, a and b are parallel.
4-82 Use Definition 418 to evaluate the vector products a x b given that:
(a) a = -i + 4j - 2k, b = 2i + 3j + k;
(b) a = -2i - 3j + k, b = 6i + 9j - 3k;
(c) a = 3i - k, b = 2j.
Evaluate these same vector products using Table 4-3 and compare the effort
involved.
4-83 Verify the distributive property of the vector product:
ax(b + c) = axb+axc,
given that a = 2i + j — k, b = i — z) + k and c = 3i — 2j + 3k.
PROBLEMS / 175
4-84 Evaluate the triple scalar products a . (b x c) and (b x a) . c given that:
(a) a = 2i - j - 3k, b = 3k, c = i + 2j + 2k;
(b) a = i + 2j + k, b = 2i + j + k, c = 4i + 2j + 2k.
4-85 Prove that if a, b, and c form three edges of a parallelepiped all meeting at a
common point, then the volume of this solid figure is given by | a (b x c) I
Deduce that the vanishing of the triple scalar product implies that the vectors
a, b, and c are co-planar (that is, all lie in a common plane).
4-86 Determine the vector products a x (b x c) given that :
(a) a = 2i - j - 3k, b = 3i + j + k, c = -i + j + k;
(b) a = -i + j - k, b = 2i - 2j + 2k, c = i + k.
4-87 Prove that (a x b) x c = (a . c)b - (b . c)a.
Section 4-8
4-88 Find the vector equation of the line through the point with position vector
2i- j- 3k wh.ch , s parallel to the vector i + j + k. Determine the points
corresponding to X = - 3, 0, 2 in the resulting equation. P
489 v^ctortT-ll ? i at 'T ° f ^k 116 throUgh the P° intS A and B with Potion
cSnS of rnis'lmV " " "* " = "' + j + * ° et ™ the *«*<»
4-90 The equations
3 *+ 3 -2y + 1 _ 2z + 6
2 7 ~
SSefof rhe £? ^ **"" ta ' m ™** f ° m and find *e direction
4-91 If the points A and B have position vectors a and b, and point C divides the
line AB in the rat.o X : M , show that C has the position vector
,ua + Ab
A+ fl - provided A + /t ^ o.
4M S£in he V6 f ° r eqUati ,° n ° f thC Hne that P asses throu g h th « point A with
rhere b n : e r: r 2j T 7k 2, an"d J c + = k _Tti ^ " ^ ** ^ " "^
4-93 Find ^he perpendicular^ distance of the point 2i + j + k from the „„e
4-94 Find the perpendicular distance of the point i + 3j + 2k from the line
2x - 1 y + 2 z- 1
2 3 -~r~
4 ' 95 a^no'rmairf+jTk 11011 ° f "" *"" ^^ «* ^ * " J + 2k
4-96 Find the Cartesian equation of the plane containing the point 3i - k and
alw contauung the two vectors a, £ where a = i S + 2j I k and B = -1
176 / COMPLEX NUMBERS AND VECTORS CH 4
4-97 Find the angle between the two planes
jc— l_y + 2_z— 3
2 r~ ~ 3
and
2x-l _ y-l _ 2z + 1
2 ~ 3 ~ 3
4-98 Find the angle between the plane z = 2 and the plane
x + 2 y - 1 z+ 1
4-99 Let H be the plane r . n = p, where fi is the unit normal to II and p is its
perpendicular distance from the origin. By constructing a plane 11' parallel
to 11 through point P with position vector a, show that the perpendicular
distance of P from 1 1 is given by the expression | o . n — p \ . What form
would this expression take if the plane was expressed in the form r . n = q,
where | n | # 1.
4100 A line may be uniquely determined as the intersection of two planes
r . in = pi and r . 112 = pz (A)
where ni and no are not necessarily unit vectors. The direction of the line is
normal to both ni and nz and so is parallel to ni x 112. Hence the line has the
equation r = a + A(m x n2) where A is a parameter and a is some point
common to the two planes in (A). Apply these arguments to obtain the vector
equation of the line determined by the planes
x + 2v — z = 3 and 2x + y + 2z = 1.
4-101 Find the Cartesian equation of the sphere of radius 3 about the centre
a = 2i + 3j + k.
4-102 Construct the Cartesian equation of the sphere of radius 4 that lies on the
side z > of the plane z = and is tangent to the point (3, 1, 0).
4103 The inward drawn normal to a sphere of radius 2 at the point (1, 1, 2) on its
surface is 11 = 2i — j + k. Deduce its equation in Cartesian form.
Section 4-9
4-104 Forces Fi, F2, F3, and F4 have magnitudes 2\ 6, 3V5, 3, and 15 lb and act
concurrently through a point O along the lines of the vectors — i + 2j — k,
2i + k, 2j, and 4i + 3j, respectively. Find the resultant of these forces and
determine its magnitude in lb.
4- 105 Forces 1, 2, and 3 act at one corner of a cube along the diagonals of the faces
meeting at that corner. Find the magnitude of their resultant, and its inclina-
tion to the edges of the cube.
4106 A sphere of 10-in radius and mass 20 lb has one end of a string 18 in long
attached to its surface and hangs at rest against a smooth vertical wall to
which the other end of the string is attached. The string has a tension Tand
the wall exerts a normal reaction R at its point of contact with the sphere.
Use a vector triangle of forces to determine T and R.
PROBLEMS / 177
4107 Deduce that for three concurrent forces Fi, F2, and F3 to be in equilibrium
they must form a closed vector triangle of forces, and hence be coplanar.
Use your result to prove Lami's theorem, which asserts that when three con-
current forces are in equilibrium, the magnitude of each force is proportional
to the sine of the angle opposite to it in the vector triangle of forces.
4-108 Find the centre of mass of the masses 1, 3, 4, and 2 lb situated at points with
the respective position vectors 3i — j + k, 2i + 2j + 2k, — i + 7j — k, and
4i - 10k.
4- 109 Prove that the centre of mass of a system of masses is independent of the
choice of origin.
(Hint : Choose a new origin O' with position vector b relative to the original
origin O and apply the definition of centre of mass.)
4-110 The velocity of a boat relative to the water is represented by 4i + 3j, and that
of the water relative to the earth by 2i — j. What is the velocity of the boat
relative to the earth if i and j represent velocities of 1 mile/h to the east and
north, respectively?
4111 The point of application of the force 9i + 6j + 7k moves a distance 5 ft in
the direction of the vector 3i + j + 4k. If the modulus of the force vector is
equal to the magnitude of the force in lb, find the work done.
4- 112 A body spins about a line through the origin parallel to the vector 2i — j + k
at 1 5 rad/s. Find the angular velocity vector Si for the body and find the
instantaneous linear velocity of a point in the body with position vector
i + 2j + 3k.
4- 113 Find the torque of a force represented by 3i + 6j + k about point O given
that it acts through the point with position vector — i + j + 2k relative to O.
4-114 Masses 1, 3, and 2 units at the points specified by the position vectors 3i — k,
2i — 3j + k, and i + j 4- k relative to point O have velocities represented by
2j + k, 3i + j + 2k, and i — j + k, respectively. Determine the vector sum
of the moments of momentum of each of these masses about O.
Differentiation of functions
of one or more real
variables
5-1 The derivative
The important branch of mathematics known as the calculus is concerned
with two basic operations called differentiation and integration. These
operations are related and both rely for their definition on the use of limits.
The calculus was founded jointly, and independently, by Newton in
England, and by his contemporary Leibnitz in Germany to whom we owe
the essentials of our present day notation. In introducing the ideas underlying
a derivative we shall make use of a simple dynamical problem in very much
the same way that Newton did when first formulating his early ideas on
differentiation. However we have the advantage of understanding the nature
of a limit more clearly than was the case in his day, so that after presenting
our heuristic argument, we shall quickly formalize it in terms of the ideas set
down in Chapter 3.
We shall consider how to define and determine the instantaneous speed
of a point P moving in a non-uniform manner along a straight line. To be
precise, we shall suppose that a fixed point O on the line has been selected,
and that the distance s of point P from O at time t is determined by the
equation
where f(t) is some suitable continuous function of t defined on some interval
J . Thus we know the position of P at a general time t, and are required to
use this information to define and find the speed of P at any given instant of
time. When the motion of P is uniform, so that its displacement is proportional
to the elapsed time, the familiar definition of speed as distance per unit time
can be used. However if the motion is non-uniform we must consider the
situation more carefully. We shall use intuition here and first consider the
difference quotient
f(t 2 ) -f(h)
H — t\
in which t\ and t% are two different times belonging to J .
It seems reasonable to suppose that if H were to be taken sufficiently
close to t\ then expression (5-1), which is the quotient of the finite distance
travelled and the elapsed time, would in some sense provide a measure of the
SEC 5-1 THE DERIVATIVE / 179
average speed of P in the small time interval H — t\. Even better would be the
idea that we compute the difference quotient (5-1) not for one time t% close
to t\, but for a monotonic sequence {n} of times having for its limit the time
ti which is not a member of the sequence. This last condition is necessary
because Eqn (5-1) is not defined if H = t\. Then if the sequence of difference
quotients corresponding to Eqn (5-1) has a limit we propose to call the value
of this limit the instantaneous speed u(t{) of P at time t\.
Expressed in the symbolic form of Chapter 3 we may write
[fin) -/(/i)l
u{t\) = lim
_ Ti — t\
(5-2)
This definition is obviously consistent with the case of the uniform motion
of P, for then every difference quotient involved in the determination of the
limit (5-2) would give the same constant value u, say. We will call this value u
the constant speed of P.
As the function /(r) is continuous it is clearly desirable that we define
not in terms of the discrete variable t< but in terms of a continuous variable t.
Fortunately we can do this easily, for the conditions of the connecting
Theorem 3-6 are satisfied and allow us to rewrite Eqn (5-2) thus:
« t) =^\m=m\ (5 . 3)
+t L r-t
We have now dropped the suffix 1 since t\ was not specific and represented
any value of the time t belonging to J.
It should be appreciated that the limit u(t) in Eqn (5-3) is a number and
not a ratio of quantities as were the members of the sequence used to define
the limit. The instantaneous speed u(t) can be interpreted as the distance
through which P would move in unit time if, during that time, it were to move
at a constant speed equal to the value u(t). Because Eqn (5-3) is consistent
with the notion of a constant speed, it is customary to omit the adjective
'instantaneous' and to speak only of the speed of P.
The limit involved in Eqn (5-3) is of the indeterminate type and it will be
our object to devise techniques for evaluating such limits for a wide class of
functions /(?). In trivial cases these may be determined by simple algebraic
considerations as this example shows.
Example 5-1 Suppose that the distance of a point P from a fixed origin at
time t is determined by the equation /0) = let 3 , where A: is a constant with
dimensions (Length)(Time)" 3 . Find the functional form of the speed u(t)
at time t, and determine its value when t = 4.
Solution We are here required to evaluate the limit
~k(T 3 - t 3 )'
u(t) = lim
T— « L
T — t
180 / DIFFERENTIATION OF FUNCTIONS
CH 5
which is the form assumed by Eqn (5-3) when/(?) = kt 3 .
Using the identity t 3 - t 3 = (t - t)(r 2 + rt + f 2 ) we may write
u(t) = lim
k(r- ?)( t 2 + t? + /2) -
(t -
= limA:(T 2 + rt + t 2 )
= 3/fcr 2 .
Thus the functional form of the speed is u(t) = 3kt 2 , so that at / = 4 the
speed has the value w(4) = 48/c.
It is often helpful to check the form of a result by means of dimensional
analysis. This is achieved by representing the fundamental quantities of mass,
length, and time occurring in expressions and equations by the symbols M,
L, and T, and ignoring any purely numerical multipliers that may be involved.
The equations then become identities between expressions of the form U>M r T s ,
where p, r, and s are real numbers. Quantities other than length, mass, and
time are represented as suitable combinations of these fundamental quantities.
Thus speed and acceleration would be written LT' 1 and LT~ 2 , respectively,
with no account being taken of their magnitudes. We illustrate this approach
with Example 5-1. By supposition k has dimensions LT~ 3 , so that from the
form of the solution we see that u(t) must have the dimensions kT 2 = (LT- 3 )T 2
= LT' 1 , which are the dimensions of speed, as required.
A
Distance
/w
_ *r
,y
M
1*
,'l
T-»
IBliiltilili
Illlllilllll
/
r Time t T
Fig. 51 Speed interpreted as a derivative.
There is a valuable graphical interpretation of the limit (5-3) shown in
Fig. 5T which is the graph of a function f(t) together with the chord PQ,
SEC 5-1 THE DERIVATIVE / 181
where P is the point (?,/(/)) and Q the point (t,/(t)).
The difference quotient within the brackets of Eqn (5-3) before the limit
is taken is the tangent of the angle QPR. In the limit asr-*-(, so the point Q
approaches the point P and the chord PQ approaches the tangent PS to the
curve y —fit) at P. The value u(t) arrived at by considering the limit of the
difference quotient (5-3) is thus the tangent of the angle SPR and so is equal
to the gradient or slope of the curve y = fit) at P. The number uQi) evaluated
at any specific time t = t\ is the derivative of /(f) with respect to t at t = t\.
The limit u(t) as a function of t is simply called the derivative of fit) with
respect to / and the operation of computing the derivative of a function is
called differentiation. A function that possesses a derivative at each point of an
interval is said to be differ entiable in that interval. Hence in Example 5-1, the
derivative of kt 3 with respect to t at t = 4 is 48A:, whereas the derivative of
kfi with respect to t is the function 3k t 2 . The function kt 3 is obviously
differentiable in any finite interval.
This heuristic approach has served to introduce the limiting arguments
underlying the concept of a derivative, and we must now carefully reformulate
these arguments and express them in general terms. We shall use the following
key definitions.
definition 5T A function /(x) of the real variable x will be said to be
differentiable at xo if, and only if,
f(x)-f(x )
lim
x — Xq
exists and is independent of the side from which x approaches xo. More
generally, fix) will be said to be differentiable in an interval J if it is differen-
tiable at each point of*/". At any points of -J for which the limit is not defined
the function /(x) will be said to be non-differ entiable.
definition 5-2 If f(x) is a differentiable function of the real variable
x at Xo, then the value of the expression
x->xq X *— Xq
df
will be denoted by/'(xo) or -^
, and we shall say that it is the derivative
x = x
of fix) at x = xo. If further we define y by the equation y = /(x), then we
dy
can also write the derivative of fix) at xo in the form —
°- x x=xa
These definitions merely express in a more sophisticated way, what is
usually put as follows.
Let y =f(x). Then if dy is the increment in y occasioned by an increment
182 / DIFFERENTIATION OF FUNCTIONS CH 5
dx in x, we have y + dy =f(x + dx) and hence
dy = f(x + dx)-f(x)
dx dx
Thus at x = xo,
dy _ f(x + dx) —f(x )
dx dx
and so
dy
dx
,. f(x + dx) -/(xo)
= lim
x = x
fa->o dx
To obtain the formulation of Definition 5-2 above, first write h in place
of dx to obtain
dy
dx
= lim
/(XQ + h) -/(xo)
= x
h-*o h
and then write x in place of xo + h, so that h = x — xo.
What does the requirement, that lim{[/(x) — /(xo)]/(x — xo)} should
X-*Xo
exist, actually mean? It is this. There is a number /'(xo) such that the left-
and right-hand limits of the function <p(x) = [f(x) — /(xo)]/(x — xo) as x
approaches xo exist and are both equal to/'(xo). The function q>(x) itself is
defined near but not at x = xo but has the property that lim <p(x) = /'(xo).
x-*x
We shall use this idea together with Theorem 3-4 when we discuss the general
properties of derivatives of combinations of functions.
If in Definition 5-2 we write xo + h in place of x, and replace xo by x in
the subsequent result, we may formulate this definition.
definition 5-3 If j' =/(x) is a differentiable function of the real variable
x at all points of an interval J , then the derivative of/(x) in J is the function
denoted either by f'(x) or dyjdx and defined by
W . f* - lim ** + *>-**>.
dx ;,_o "
The operation of computing the derivative of a function is differentiation.
Let. us now apply exactly the same arguments to Fig. 5-2 as were used in
connection with the speed at a point of the particle trajectory in Fig. 5T.
This time the graph represents any function y = f(x) satisfying the conditions
of Definition 5-3. Then if P is any point in the interval within which /(x) is
differentiable, and Q is an adjacent point, the chord PQ is, in some sense, an
approximation to the tangent line to the curve PR at P. The limiting position
SEC 5-1
THE DERIVATIVE / 183
Ax+h)
Fig. 5-2 Derivative interpreted as a gradient.
of the chord PQ will lie along the tangent line to the curve at P and in terms
of angles we have lim 6 = a. However,
f(x + h) -f(x)
= tan i
so that
f( x + h)- f(x)
hm : — = hm tan 6
a->o h
whence, finally,
fix) = tan a,
or, equivalently,
ay
■f- = tan a.
ax
h-*a
(5-4)
(5-5)
This result shows that we may interpret the derivative of a differentiable
function at a point as the gradient of the tangent line drawn to the curve at
that point. It is implicit in the definition that the tangent line so defined should
be independent of whether Q approaches P from the left or right.
The geometrical interpretation of a derivative allows us to see quite
clearly that in addition to the function needing to be continuous in the
neighbourhood of a point at which it is required to be differentiable, it also
184 / DIFFERENTIATION OF FUNCTIONS
CH 5
Fig. 5-3 Non-differentiable function at x = xi and x
needs a special kind ot smoothness. Specifically, the left- and right-hand
tangents to the curve at the point in question must be one and the same.
Indeed, we could re-phrase our definition of differentiability in terms of the
equality of the left- and right-hand tangents at a point on the curve, just as we
did when dealing with continuity.
Consider the function f = f(x) shown in Fig. 5-3 and defined on the
interval [.\o, V3], but only continuous in the semi-open intervals [xo, X2)
and (.Y2, .y 3 ] .
Then, despite the fact that the function f(x) is continuous in [*o, xz)
and („Y2, .Y3], it is only possible to assert that tangent lines in the sense implied
by Definition 5-3 can be constructed for points in the open intervals (x , xi),
(xi, .Y2), and (.y 2 , .Y3). No tangent line can be constructed at X2 because of the
discontinuity; two tangent lines h and h can be constructed at point P
according as A and B approach P from the left and the right; whilst only
right- and left- hand tangents h and U can be constructed at the end points
.Yo and .Y3 because the function /(.y) is not defined outside [xo, X3].
We shall now show how Definition 5-3 may be used to determine the
derivative of a function and also to prove its non-differentiability at a certain
point. Our example is a continuous function whose behaviour is clear at all
points other than the origin, at which the existence, or otherwise, of a tangent
line to the curve cannot be deduced by inspection of its graph.
Example 5-2 Prove that the function / defined by f(x) = x sin (1/.y) for
.v # and/(0) = is continuous in (— 00, 00) and sketch its graph. Find its
derivative by use of Definition 5-3 and show that it is not differentiable at
the origin.
SEC 5-1
THE DERIVATIVE / 185
Fig. 5-4 The function y = x sin (1/x).
Solution Only the behaviour of/in the vicinity of the origin is in doubt here.
When x ^ o we may write /(x) = [sin (l/x)]/(l/x) showing that for large x,
fix) behaves like lim (sin h)jh = 1. Conversely, as the origin is approached,
so x -> and because sin (l/x) is bounded by ± 1 it follows that lim/O) = 0.
The limit of the function/O) at the origin is thus equal to the functional value
itself and so/0) is continuous at the origin. It is clearly continuous elsewhere
since it is the product of two continuous functions. Hence it is everywhere
continuous and Fig. 5-4 shows its graph, which is symmetric about the j-axis
because /O) is an even function.
We shall approach the differentiability question in two stages: first for
x ^ 0, and then for x = 0. Assuming x # and making a direct application
of Definition 5-3 we obtain
fix) = lim
O -+- h) sin ( I —
\x + hj
xsm
which we re-express as
{x + h)sm\ l -l\ +-Y 1
f'{x) = lim
A-.0
— x sin -
x
Now for h close to zero we may use the binomial theorem together with our
'little oh' notation of Section 3-4, to write [1 + (A/x)]-i = 1 - (hjx) + o(h)
as h ->- 0, and hence
186 / DIFFERENTIATION OF FUNCTIONS
CH 5
/'(*) = Um
(x + h) sin
;H + *>)-
a: sin ■
Next we write the argument of the sine function as
[(l/x) - (hlx*) + [o(h)]/x]
and use the trigonometric expansion for the difference of two angles to obtain
lim
(x + h)
" . 1 (h o(h)\ 1 . / h o{h)\
sin - cos — — cos - sin ( —
x \x* x J x xx 2 - x J .
— x sin -
X
h-*o L h
Consider the behaviour of the terms comprising this quotient. If the first
and last terms are taken together then in the limit as h -> they reduce to
the single term sin (l/x). The remaining term in the centre is
— (x + h) cos -
x
sin
U 2 x )
and since x ^ is fixed, it follows from limit (3-9) that this reduces to
1 1
cos -
X X
as h ->■ 0.
Combining these two results we find that the derivative /'(x) is
f'(x) = sin cos - for x # 0.
Thus we have used Definition 5-3 to compute the derivative, and as this is
defined for all x ^ it follows that y = x sin (l/x) is differentiable for all
such x.
Finally we must examine the behaviour of the derivative at the origin
using Definition 5-3. Setting x = 0we obtain
h sin (I jh) -0
/'(0) = lim
h
= lim sin (1/fi).
As sin {\jh) oscillates boundedly with ever increasing frequency when
h — >■ 0, it follows that/'(0) is not defined. This establishes the non-differenti-
ability of/(x) at the origin as was required.
SEC 5-1 THE DERIVATIVE / 187
We close this section by deducing the derivatives of some important
elementary functions, and stating them as theorems.
theorem 5-1 The derivative of a constant function is zero.
Proof Let k be any constant and consider the function/(x) where f(x) = k
for all x. Then
f(x + h)-f(x) k-k nc „
JK ' — J — = = for all x.
h h
Hence
lim** + *>-**> esQ for all,.
theorem 5-2 If n is any positive integer, then the real function y = x n
is differentiable everywhere and has the derivative dj/dx = nx n_1 . If m is
any negative integer, then the function y = x m is differentiable everywhere
except at the origin and has the derivative dy/dx = mx m ~ x .
Proof We must first consider the limit of the difference quotient
[(x + h) n — x n ]jh. By the binomial theorem we have
(x + h) n - x n
x n + nxn-lfr + ^ 1 x n-2h2 + . . ■ + (") x n - r fl r +• • • + h n - X n
2! \rl
= _
n(n — 1) „, In\ , , , ,
= nx™- 1 + - 5 — — - x n ~ 2 h +■■■ + [ I xn-rhr- 1 + ■ ■ • + A"" 1 .
Now lim h = so lim h r = for 1 < r < n — 1 and so
A->-0 h-*0
lim ( I a n ~ r h r - 1 = 0.
Consequently,
,. (x + h) n - x n
hm ^ = nx"- 1 .
This is defined for all finite x including x = and so proves the first part of
the theorem. Next let m = —n. Then
(x + h) m — x m _ (x + h)~ n — x~ n _ (x n - (x + h) n
h ~ h
Now from our result above
_ (x n - (x + h) n \ 1
_ \ h~ / x n (x + h) n '
188 / DIFFERENTIATION OF FUNCTIONS
CH 5
,. x«-(x + h)*
lim i- = -nx n - x
h-+o n
whilst
lim (x + h) = x and so lim (x + h) n = x n .
h-+0 k^O
If x ^ 0,
lim — = =
a-o x n (x + h) n lim x n . lim (x + h) n x 2n
h->0 h-+0
Thus
,. (x + h) m - x m ,1
lim : = — nx"" 1 . — - = — nx""" 1 = mx m ~ x .
a— o h
Hence we have proved that
y-2n
dy
dx
x=z
ax
= «xo w_1 for all xo
if n is a positive integer, and for all non-zero xo if n is a negative integer.
Later we shall prove this result for all real n. Henceforth we shall use the
result freely, irrespective of the value of n.
theorem 5-3 The functions sin ax and cos ax of the real variable x,
where a is any real number, are differentiable everywhere and
— (sin ax) = a cos ax
dx
dx
(cos ax) = — a sin ax.
Proof These results follow by applying Definition 5-3 and then using limits
(3-9) and (3-10). Thus we have
d . sin a(x + h) — sin ax
— (sin ax) = lim
dx ^0 h
= lim
"sin ax cos xh + cos ax sin ah — sin ax"
= sin ax
/cos ah — 1\ /sin cah\
lim + cos ax hm I — - — I
a-o \ It 1 h^o\ h /
= + a cos ax.
As this function is defined for all finite x, the first part of the required result
has been established. The remainder of the proof follows exactly similar lines,
and so will be omitted.
SEC 5-2 RULES OF DIFFERENTIATION / 189
Example 5-3 Find the derivatives of the following functions stating any
point at which they are not differentiable.
/ n r/ n (3 for — oo < x < 1
( a )/W = ( 2 forl<x<oo.
(b) f( x ) = x 5 for all x.
\x~ z for x ^=
(c)/W = ( lforx =
(d) f(x) = sin Ax.
(e) f(x) = cos Ix.
Solution (a) By virtue of Theorem 5-1, the function f(x) has a zero deriva-
tive for all x except at the point x = 1 where it is not defined.
(b) From Theorem 5-2 we have dy/dx = 5x 4 for all x.
(c) From Theorem 5-2 we have dy/dx = — 3x~ 4 for x^O, and the
derivative is not defined at x = 0.
(d) and (e) From Theorem 5-3 we have
— (sin Ax) = 4 cos Ax — (cos Ix) = —7 sin Ix for all x.
dx dx
By now it is obvious that Definition 5-3 is a working definition that can
be used. However, some better method than its direct application is obviously
needed to compute derivatives of complicated functions. This requirement will
be systematically pursued in the next section.
5-2 Rules of differentiation
The complicated functions that occur in mathematical and physical studies
are invariably the result of forming sums, products, and quotients of simple
algebraic and trigonometric functions. This suggests that our next task should
comprise a general study of the operation of differentiation when applied to
sums, products, and quotients of arbitrary differentiable functions. We will
present our results in the form of basic theorems which must become
thoroughly familiar to the reader.
theorem 5-4 (differentiation of a sum) If f{x) and g(x) are real valued
functions of x, differentiable at xo, and ki and &2 are constants, then the
linear combination k\f{x) + k2g(x) is also differentiable at xq. Furthermore,
±( kl f( X ) + k2g(x))
= kif'(xo) + k 2 g'(x ).
Proof Here we must apply Definition 5-3 to the linear combination
kif(x) + k%g{x). We obtain
i- (klf{x) + k 2 g(x))
dx
190 / DIFFERENTIATION OF FUNCTIONS CH 5
= ljm hfjxo + h) + k 2 g(x + h)- [kifjxo) + k 2 g(x )]
i,^o h
= kl lim /fa + ^)-/fa) + kt , im g(*o + *)-gfro)
A-0 h ,,_ h
= kif'(xo) + k 2 g'(xo).
Iff and g are both differentiable in some common interval J, then the
above argument when applied to each point of J yields the result
1 [*i/(*) + £ 2 g(x)] = kifix) + k2g'(x),
where x is any point of J . The constants £i and k 2 are often absorbed into
the functions /and g, when the result could be expressed 'the derivative of a
sum of functions is equal to the sum of their derivatives'. The task of showing
that this result is true for a linear combination of an arbitrary number of
differentiable functions is left to the reader as an exercise involving proof by
induction.
Example 5-4 Let us use Theorem 5-4 to compute the derivative of
f(x) = sin 2 x.
Solution As it stands we cannot differentiate/^). However by a well known
trigonometric identity we may transform f(x) to the form
f(x) = i(l - cos 2x),
when Theorem 5-4 becomes applicable. Then, using our earlier results
concerning the differentiation of a constant and of cos ax we find that
d d
— (sin 2 x) = — {J(l - cos 2x)}
ax dx
d /1S d ,,
= -7- (\) ~ t- (i cos 2x)
ax ax
d
= — I — (cos 2x)
ax
= — \ . (—2) sin 2x
= 2 sin x cos x.
theorem 5-5 (differentiation of a product) If/(x) andg(x) are differenti-
able real valued functions at xo, then so also is the product function/(x)g(x).
Furthermore,
£(/W*M>
= f'(xo)g(xo) + f(x )g'(xo).
x-*xa
SEC 5-2 RULES OF DIFFERENTIATION / 191
Proof Again we consider a difference quotient but this time, for economy
of expression, use the form of limit given in Definition 5-2. We have the
identity
/(*W-/W*o) s //(*)-/fa)\ ( + / g W-g(-Vo)\
X — Xo \ X — Xo I \ X — xo }
Now we wish to show that lim/(.v) =f(xo). This would be true if fix) were
.r--.ro
continuous but we only know that it is differentiable and as yet do not
know that this implies continuity. We shall prove that it does. As/(x) is
differentiable at x — xo we must have
f(x) -/pro)
=/ (*o) + o(h) asx-> xo,
x — Xo
where h = x — xo- Hence
fix) —fixo) = (x — xo)[f'ixo) + oQi)] as x ->- x .
This implies that if x is taken sufficiently close to ,y then the difference
fix) —fixo) can be made arbitrarily small. This is just our definition of
continuity and so we have proved that differentiability of/(x) at xo implies
its continuity at that point. Thus we are permitted to write
lim/(x) =/Oo)
x^xo
and, similarly,
\im gix) = gix ).
Z-KTO
Now
l ifx)-fixo) \ ( gjx) - gjx ) \
so, finally, taking the limit of (I) asi-> xo, we obtain the result
= f'(.xo)g(x ) +fix )g'ix ).
±(f(x)g(*))
Again, if / and g are both differentiable in some common interval J
then, as before, we obtain the more general result
£ ifix)gix)) =f'ix)gix) +fix)g'ix) for xe/.
As an incidental detail of this proof we have shown that differentiability
at a point implies continuity. This result is worth stating formally.
192 / DIFFERENTIATION OF FUNCTIONS
CH 5
theorem 5-6 If a real valued function /(x) is differentiate at the point
.Yo, then it is also continuous there. The converse result is not true.
Proof It only remains to prove that the converse result is not true: namely,
that continuity does not imply differentiability. This has already been seen
in connection with Fig. 5-3, but let us give a specific example. Our final
assertion in Theorem 5-6 will be valid even if we can produce only one
example of a function that is continuous at a point but is not differentiable
there. Such an example used to prove the falsity of an assertion is a counter-
example, and in this case we choose the function /(x) = |x|. This is known
to be continuous at x = 0, but the derivative as defined in Definition 5-3
is not denned at the origin so the function is not differentiable at that point.
Example 5-5 Differentiate the function f(x) = sin 2 x and compute/' (577).
Solution We express the function as a product and use Theorem 5-5.
d d
— (sin 2 x) = — (sin x . sin x)
ax ax
— (sin x)
ax
sin x + sinx
dx
(sin x)
= 2 sin x
-(sinx)
= 2 sin x cos x.
As would be expected, this verifies the result of Example 5-4. Finally,
using this expression we compute
dx
(sin 2 x)
X= Jtt
. . 77 77
= 2 sin- cos- = 1.
4 4
Our next theorem is important and concerns the rule for differentiating a
composite function or, more simply, the rule for the differentiation of a
function of a function.
theorem 5-7 (differentiation of composite functions) If g(x) is a real
valued differentiable function at x = xo and/(w) is a real valued differentiable
function at u = g(xo), then/[g(x)] is differentiable at x = xo. Furthermore,
d " {/[*(*)]}
= flg(xo)].g'(xo)-
Proof We have the obvious result
f[g(x)]-f[g(xo)] f[g(x)] -f[g(xo)] g(x) - g(xo)
X — xo
g(x) - g( x o)
X — Xo
SEC 5-2
RULES OF DIFFERENTIATION / 193
(A)
Since g(x) is differentiable at x it is continuous there, and so g(x) -> g(xo)
as x -* x . So, writing g(x) = w, g(x Q ) = a we have
/[*(*)] -/fcfro)] _ /(»)-/(«) g(x)-g(x )
X — Xo K — (2 X — Xo
Now for ease of argument we shall assume the behaviour of g(x) to be
strictly monotonic in some neighbourhood of jc , so that g(x) = g(x ) only
when x = x . In these circumstances the difference quotients on the right-hand
side of (A) are well defined as x ->■ x so that we may take limits and obtain
dx
{/[*(*)]}
= lim
X = XQ %~>xo
= lim
f[g(*)} -/fc(*o)]
X — Xo
~ m -m
. u — a
. lim
X-*Xo
'g(x) ~ g(*o) "
X — Xo
= /'(«)• g'(*o)
= /fe(*o)].£'(*o).
(B)
It is not difficult to show that the theorem is still true when g(x) is not
monotonic in some neighbourhood of x and an infinite sequence of points
{x t } exist with limit point x at all of which g(x t ) = g(x ).
All that is necessary here is to observe that if x ->- xo through the suc-
cessive values xt of this sequence, then g(x t ) — g(x ) = and so
g(Xi) - g{x )
= for every i.
Xi — Xo
Hence, by Theorem 3-6, it follows that
Tx { ^
0.
However, by the same argument,
flgixt)] -flgjxo)]
Xi — Xo
showing that
for every /,
rx (A S (m
= o,
and so result (B) is also valid in this case.
If (B) is true at each point of some interval J, then we have the general
result
^{f[g(m=fig(x)].g\x).
194 / DIFFERENTIATION OF FUNCTIONS CH 5
When the substitution u = g(x) is made, this result can be written:
d df du
In this form the theorem is known as the chain rule for differentiation,
and it is this result that is most often found in textbooks. By repeated applica-
tion, the chain rule readily extends to enable the differentiation of more
complicated composite functions such as the triple composite function
f{g[h(x)]}, always provided the functions/, g, and h have suitable differenti-
ability properties. In this case, setting v = h(x) and u = g(v) result (5-6)
takes the form
d r ,. ,, df du dv
t- / («) =j--t-t- (5-7)
dx du dv dx
Further extensions of the same kind are obviously possible and are
left to the reader.
Example 5-6 Differentiate the following functions and find the values of
their derivatives at x = 1 :
(a) sin(x2 + 3);
(b) (jcs + x + l)i/3;
(c) sin V(l + x 2 ).
Solution (a) Set u = x 2 + 3 so that
d d .
— [sin (x 2 + 3)1 = — (sin u).
dx dx
From the chain rule :
d d du
— [sin (x 2 + 3)] = — (sin u) . — •
dx du dx
Now (d/dw)(sin u) = cos u, du/dx = 2x so that
d
— [sin (x 2 + 3)1 = (cos u) . 2x
dx
= 2x cos (x 2 + 3).
nee at x = 1,
— [sin (x 2 + 3)]
dx
= 2 cos 4.
x = l
(b) This time set u = x 3 + x + 1,
SEC 5-2
RULES OF DIFFERENTIATION / 195
dx dx
From the chain rule:
-£- [(*» + x + l)i/3] = d (M i/ 3) . p.
dx du dx
Hence as (d/dt/)(« 1/3 ) = §h~ 2/3 , dw/dx = 3x 2 + 1 we obtain
— [(X 3 + X + 1)1/3] = (l M -2/3) . ( 3x 2 + !)
dx
Thus when x = 1 ,
d
dx
[(x3 + X + 1)1/3]
- J_
~ W 3 '
(c) We must use the extension of the chain rule given in Eqn (5-7). Set
v = 1 + x 2 when sin -\/(l + x 2 ) = sin yjv, and u = \/v when sin y/(l + x 2 )
= sin u.
Then
_d
dx
[sin VO + x 2 )] = — (sin u)
ax
However,
dv
dx
= 2x and
r d , ■ j
du dv
"" ""
— (sin m)
_d«
dv dx
du dv
= cos u — •
dv dx
du
dv
1
2^(1 + x 2 )
so that, combining all the results,
d r • //i , ?Yi x cos V(l + * 2 )
— [sin V(l + * 2 ) = tt;— — «
dx v(l + x )
Whence at x = 1,
- [sin V(l + x 2 )]
cos \/2
V2
theorem 5-8 (differentiation of a quotient) If /(x) and g(x) are real
196 / DIFFERENTIATION OF FUNCTIONS
CH 5
valued differentiable functions at x and g(x ) ^ 0, then the quotient
f(x)/g(x) is differentiable at x . Furthermore
dx \_g(x)_
X = XQ
g(xo)f'(x ) - g'(xo)f(xo)
lg(xo)] 2
Proof If we consider the quotient f(x)/g(x) to be the product of the two
functions /(x) and \/g(x), we have by Theorem 5-5
d* lg(x)_
x = x
Wy f0c)+Ax) t
lg(x\
Now we must compute (d/dx)(l/g). We set g(x) = u when, from the chain
rule,
d
dx
1
Six).
x=xo
d_
dx
T
u.
1 du
w 2 dx
-g'(x )
x = x
x^xo
lg(xo)] 2
Hence, combining our results, we obtain the desired result
dx \_g(x\
g(xo)f'(xo) - g'(x )f(x )
lg(xo)] 2
As in the other cases the general result follows when the conditions of
the theorem are satisfied throughout 'some interval J '. It has the obvious
form
_d
dx
fix)
lg(x).
g(x)f'(x) - g'(x)f(x)
[g(xW
Example 5-7 Differentiate (3x + l)/(x 2 — 2) and determine the values of
x for which the derivative is not defined.
Solution Set f(x) = 3x + 1 and g(x) = x 2 - 2. Then f'{x) = 3 and
g'(x) = 2x for all x, whilst g(x) = for x = ±\/2. Hence applying
Theorem 5-8 we have
d
'3x + r
x 2 - 2.
(x 2
- 2) . 3 - (2x)(3x + 1)
dx
(x 2 - 2)2
"3x 2 + 2x + 6'
(x 2 - 2)2
SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 197
provided x ^ ± \/2.
To complete this section, Table 5- 1 summarizes the results of differentiating
the trigonometric functions. Unfamiliar results may be deduced by directly
applying Theorem 5-8 to the definitions of the functions concerned.
Table 51 Derivatives of trigonometric functions
— (sin x) = cos x — (cos x) = — sin x — (tan x) = sec 2 x
d* Ax ax
— (cosec x) = — cosec x cot x — (sec x) = sec x tan x — (cot x) = —cosec 2 x
ax ax ax
5-3 Some important consequences of differentiability
We preface this section by proving a result that belongs more properly to
Chapter 3 since it depends for its validity only on the property of continuity.
Our sole reason for discussing it here is to present it in the context in which it
will first be used. It is usually known by the name of the intermediate value
theorem and we shall now show that the idea underlying it is extremely
simple.
Consider the situation in which a recording thermometer attached to
some piece of equipment records its temperature at pre-assigned times.
Suppose, for instance, that at times ri and H the temperatures recorded were
7i and T%, respectively. Then although there is no record of the variation of
the temperature T(t) at times t between ti and ?2, it may be safely inferred
that the temperature will pass at least once through each intermediate value
between 7\ and Ti. It is quite possible for the temperature to assume values
that do not lie between T\ and Ti, but no assertion can be made about such
an event. The situation is illustrated in Fig. 5-5 where T* is a typical tempera-
ture intermediate between T\ and T%, and the dotted and solid lines
represent two possible temperature variations with time.
This physical situation is an example of the operation of the intermediate
value theorem in everyday life, and we are able to make our assertion because
we know from experience that however rapidly a temperature may change,
it can never undergo an abrupt jump. In mathematical terms we are saying
that temperature change must be a continuous process. Expressed like this
the result seems obvious, but how may we prove it ? Our simple proof relies
on the postulate of Section 3-2, which asserts that every bounded monotonic
sequence tends to a limit, but first we state the formal result.
theorem 5-9 (intermediate value theorem) Let the real valued function
f(x) be continuous on the closed interval [a, b] and such that /(a) =£f(b).
Then if y* is any number intermediate between f(a) and f(b), there exists a
number x* between a and b such that y* = /(**).
198 / DIFFERENTIATION OF FUNCTIONS
CH 5
Proof Although a diagram is not essential for this proof, the representative
situation shown in Fig. 5-6 will be of help.
First set x 1 = \{a + b), then if/CxJ = y* the result is proved. If not
consider the intervals (a, xi), (x h b). Then in one of these two intervals, y*
will lie between the functional values occurring at either end of the interval.
Call this interval h and let it be represented by the open interval (ai, bi).
Thus in Fig. 5-6, h is the right-hand interval and so in that case a\ = \{a + b),
bi = b.
Next set x 2 = h(a^ + b x ). If f(x 2 ) = y* the result is proved. If not con-
sider the intervals (a\, X2), (x2, bi). Then in one of these two intervals, y*
will lie between the functional values occurring at either end of the interval.
Call this interval h and let it be represented by the open interval (a^, b%).
in Fig. 5-6 the interval h is the left-hand sub-interval of h, so that a% = a\,
bz = i(ai + bi).
We either prove the result directly for some x n or we define an infinite
sequence of open intervals h => h => h => . ■ ■■ Because each interval is
contained by all its predecessors it then follows that the sequence of numbers
fli, a%, fl3, • . . is monotonic increasing and bounded above whilst the
sequence of numbers b\, b%, b%, . . . is monotonic decreasing and bounded
below. Hence by the postulate of Section 3-2, the sequences {at} and {bi} both
tend to a limit. That they both tend to the same limit follows from the fact
that the length of the nth interval /„ is (b — d)\2 n , which tends to zero as
n ->oo. Letting the common value of these two limits be denoted by x*
Temp.
/
/
T
/ /
Bi
1/
f
1 2
/ /
i /
T*
j f
vH
H*
f 1
•JBB
t 1
t 1
i 1
t /
r,
\
/ /
i
/
k
.
' 2
Time t r
Fig. 5-5 Physical illustration of intermediate value theorem.
SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 199
we have lim \f(a n ) — f(x*)\ =0, thereby showing the existence of the
n— »-oo
required number x*.
The following is an obvious consequence of the intermediate value
theorem :
Corollary 5-9 Every function that is continuous in a closed interval attains
both its greatest and least values at points of that interval. These values may
occur at the end points of the interval.
Fig. 5-6 Intermediate value theorem.
5-3 (a) Maxima and minima
One of the most familiar and useful applications of differentiation is to the
problem of determining those points in some interval [a, b] at which a
function /(x) assumes its maximum and minimum values. Collectively these
values are known as the extrema of the function/(x) on the interval [a, b] and
they are of various types as this definition indicates.
definition 5-4 (extrema) Let/(x) be a continuous function defined on
the interval [a, b] so that it attains its greatest and least values at points of
that interval. Then we say that the point x belonging to [a, b] is:
(a) an absolute maximum if/(x ) >f(x) for all points x in [a, b] ;
(b) an absolute minimum if/(x ) </(x) for all points x in [a, b] ;
(c) a relative maximum if/Oo + h) — /(*o) < for \h\ sufficiently small;
200 / DIFFERENTIATION OF FUNCTIONS
CH 5
(d) a relative minimum if/(xo + h) —f(xo) > for \h\ sufficiently small.
No assumption of differentiability has been made when formulating this
definition so that in Fig. 5-7, point P is an absolute maximum and both
points R and T are relative maxima. Point Q is an absolute minimum and
point S a relative minimum. Although the functional value at U lies inter-
mediate between those at Q and S, it is not a relative minimum in the sense
of the definition, because it lies at the end of the domain of definition [a, b]
so that only the one-sided behaviour of the function is known there with
respect to h.
Fig. 5-7 Extrema of a function on [a, b].
If now, in addition to continuity, we also require of/(x) that it be differen-
tiable at the point xo occurring in Definition 5-4, we can easily devise a simple
test to identify' the points where extrema must occur. Consider point P in
Fig. 5-7 as representative of a maximum at which the function is differentiable.
The fact that P happens to be an absolute maximum is immaterial for the
subsequent argument.
By supposition, if/ is differentiable at P, the expression
f'(xo) = lim
'fix)
X — Xo
Axon
must be independent of the manner of approach of x to xo. Now for maxima
of types (a) and (c) we have/(x) — f(xo) < 0, and hence it follows that when
x < xo, f'(xo) is the limit of an essentially positive function ; whereas when
x > xo, f'( x o) is the limit of an essentially negative function. Clearly this is
only possible if f'( x o) — 0. We have thus proved that if/ is differentiable at
xo, then a necessary condition that/should have a maximum at xo is/'(xo) = 0.
SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 201
Similar reasoning establishes that the condition f'(x ) = is also a neces-
sary condition for the differentiable function /to have a minimum at xo- To
show that the vanishing of the derivative /' at a point is not a sufficient
condition for that point to be an extremum, we appeal to a counter-example.
The function /= x 3 has a continuous derivative/' = 3x 2 which vanishes at
the origin. Nevertheless, / is negative for x < and /is positive for x > 0,
thereby showing that despite the vanishing of the derivative, neither a
maximum nor a minimum of the function can occur at the origin. Later we
shall identify behaviour of this nature as typical of a point of inflection with a
horizontal tangent. Generally speaking, a point of inflection is a demarcation
point on the graph of a differentiable function separating a region of con-
vexity from a region of concavity. Collectively the points at which the deriva-
tive vanishes, regardless of whether or not they are maxima, minima, or points
of inflection are called critical points or stationary points of the function.
Combining the previous results, and recalling that the condition that/be
differentiable at xo precludes behaviour of the type encountered at point T
in Fig. 5-7, we are able to formulate the following general result.
theorem 5T0 Let/ be a real valued differentiable function on some
interval [a, b]. Then the stationary points of/ are the numbers £ for which
fW = o.
Once the stationary points of a function have been determined it is
necessary to examine the functional behaviour in the vicinity of each one in
order to determine the nature of the point involved. An absolute maximum
is identified from amongst the relative maxima by direct comparison of the
functional values at the stationary points in question. A similar process
identifies an absolute minimum.
Example 5-8 Without appealing to graphical ideas, find the location and
nature of the extrema of the following two functions and determine if they
are differentiable at these points :
(a) f(x) = 1*3 + 2;C 2 + 3jc + 1 ;
(b) f(x) = (2x - 5)x 2/3 .
Solution (a) The stationary points are determined by finding those values
x = | for which the derivative/' vanishes.
Now/' = x 2 + 4x + 3 and so the desired stationary points are given by
the roots of the equation
f 2 + 4£ + 3 = 0.
These roots are f = — 1 and f = — 3, and the functional values at the
respective points are/(— 1) = —J and/(— 3) = 1. As the derivative/' is the
sum of continuous functions it is everywhere continuous, so that no cusp-like
behaviour with associated extrema as typified by point T in Fig. 5-7 can arise.
202 / DIFFERENTIATION OF FUNCTIONS CH 5
So the two points £ = — 1 and f = —3 are the only ones at which stationary
values can occur. An examination of the behaviour of the function near these
points will determine if these stationary values correspond to maxima,
minima, or points of inflection.
A sketch graph would quickly show that in fact f = — 3 corresponds to
a local maximum and f = — I to a local minimum, but we are specifically
required to establish these results by analytical means. How then can we do
this? The solution lies in a direct application of Definition 5-4, and we
illustrate the argument by considering the stationary point f = —1. To find
the behaviour of f close to f = — 1 we shall set x = — 1 + h, where h is
small, and substitute in/(.Y) to obtain
/(_1 + /,) = i(_i + hf + 2(-l + A)2 + 3(-l + h) + 1,
whence,
/(-l+A)=-* + A 2 + y-
Now/(— 1) = —J so that we may also write this result in the form
f(-i+h)-f(-i)=h^i+^y
Clearly for \h\ small, the right-hand side is essentially positive, and so we
have succeeded in showing that close to f = — 1 ,
f(S + h)-f(i)>0,
and so by Definition 5-4 (d) the stationary point f = — 1, at which /(f)
= —J, is seen to be a local minimum. An exactly similar argument will
establish that the stationary point f = —3, at which /(f) = 1, is a local
maximum. These are only local extrema because it is possible to find values
of x for which/> 1 and/< — tj.
Solution (b) This case is more complicated. We have
df 20-x - 5)
d* 3x 1/3
showing that the stationary points of/ are determined by the roots of the
equation
2(2f - 5)
= 2f 2/3 +
3| 1/3
This has the single root | = 1 at which /(l) = —3, showing that the function
has only one stationary point. To determine the nature of this point let us set
x = I + h, where \h\ is small, and substitute into/(x) to find
SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 203
/(l + h) = (2A - 3)(1 + h)*\
Next we expand the factor (1 + h) 2 ' 3 by the binomial theorem as far as
terms involving h 2 to obtain
/(l +h) = (2A - 3)(1 + %h - #* + 003))
or,
/(l + A) = -3 + W + 0(A 3 ).
Using the fact that/0) = — 3 this becomes
/(l + h) -/(l) = f/*2 + 0(A 3 )
showing that close to £ = 1, /(£ + A) —/(|) > 0. Hence by Definition
5-4 (d), the stationary point f = 1 is seen to correspond to a local minimum.
Again, it is only a local minimum because for large negative x we have
/<-3.
We now observe that/' is defined for all x other than for x = 0, at which
point /(0) = 0. The behaviour of the function in the vicinity of the origin
needs examination since, as it is not differentiable there, Theorem 5-10 can
provide no information about that point. Set x = h, where h is small, and
substitute in /to get
f(h) = (2h - 5)h 2 ' 3 .
Now/(0) = 0, so that we may rewrite this as
f(h) -/(0) = (2h - 5)h**,
thereby showing that as the right-hand side is essentially negative for suitably
small h, close to f = we have/(£ + h) — /(£) < 0. From Definition 5-4 (c)
we now see that the origin is a local maximum, despite the fact that /is not
differentiable at that point. It is only a local maximum because for large
positive x we have/>/(0). For reference purposes the function is shown in
Fig. 5-8.
The method of classification of stationary points that we have just illus-
trated is always applicable, though it provides more information than is
often required. This is so because not only does it discriminate between
maxima and minima, but it also provides the approximate behaviour of the
function close to the point in question. We shall return to this problem later
to provide much simpler criteria by which the nature of stationary points
may be identified.
5-3 (b) Rolle's theorem
One form of Rolle's theorem may be stated as follows.
theorem 5-11 Let /be a real valued function that is continuous on the
closed interval [a, b] and differentiable at all points of the open interval
204 / DIFFERENTIATION OF FUNCTIONS
CH 5
Fig. 5-8 y = (2x ■
(a, b). Then if f{a) =f(b) there is at least one point x = f interior to (a, b)
at which /'(I) = 0.
Proof We know from Corollary 5-9 that a continuous function/^) defined
on the closed interval [a, b] must attain its maximum value M and its mini-
mum value m at points of [a, b]. Then if m = M on [a, b], the function
f(x) = constant, and since the derivative of a constant is zero, the point
x = f at which /'(I) = may be taken anywhere within the interval.
If f(x) is not a constant function then m ^ M, and as f(a) =f(b) it
follows that at least one of the numbers m, M must differ from the value
f(a). We shall suppose that M ^f(a). Then clearly the value M must be
attained at some point .v = f interior to (a, b). As/is assumed to be differen-
tiable in (a, b) it follows that Theorem 5-10 must be applicable showing that
f'(i) = 0. A similar argument applies if m ^f(a). Geometrically this theorem
simply asserts that the graph of any function satisfying the conditions of the
theorem must have at least one point in the interval [a, b] at which the
tangent to the curve is horizontal.
If/ is not differentiable at even one interior point of (a, b) then Rolle's
theorem cannot be applied. Our counter-example in this instance is the
simple function f(x) — \x\ with — 1 < x< 1. This function is everywhere
continuous, and is differentiable at all points other than at the origin, but
there is certainly no point x — $ on [— 1, 1] at which/' = 0. The graph of
this function is shown in Fig. 5-9, with one of a function g(x) not satisfying
SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 205
ills.
i \
i
t
illitlll
1
A
y
■l
■s y
-1
g(a)=g(b)
(a) (b)
Fig. 5-9 Counter examples for Rolle's theorem : (a) Rolle's theorem does not apply
— no point f for which /"(f) = ; (b) g '(I) = 0, but Rolle's theorem does not apply.
the conditions of the theorem but for which the result happens to be true.
5-3 (c) Mean value theorems for derivatives
Our most important application of Rolle's. theorem will be in the proof of
the mean value theorem for derivatives. In a first account of the subject it is
difficult to indicate just how valuable and powerful this deceptively simple
theorem really is as an analytical tool. However something of its utility will,
perhaps, be appreciated after studying the remainder of this chapter. First
let us present an intuitive approach to the theorem.
Consider Fig. 510 which represents a graph of a differentiable function
f(x) on the open interval (a, b). Then as P and S are the points (a,f(a)) and
(b,f(b)), the gradient m of the line PS is
f(b)-f(a)
m = — -
b — a
Now we may identify points Q and R, with respective jc-coordinates f and rj
interior to (a, b), at which the tangent lines /i and h to the graph are parallel
to PS, and so must also have the same gradient m. Then because of the
geometrical interpretation of the derivative/' as the gradient of the tangent
line, at either P or Q we may equate m and/'. If we confine attention to point
Q we have
f(b)-f(a)
b — a
=/m
where a < £ < b. This is the form in which the mean value theorem for
derivatives, also known as the law of the mean, is usually quoted. In geo-
metrical terms the theorem asserts that there is always a point (£,/(£)) on
the graph of the function, with a < £ < b, at which the tangent to the curve is
parallel to the secant line PS. The fact that the precise value of f is not
usually known is, generally speaking, unimportant in the application of this
206 / DIFFERENTIATION OF FUNCTIONS
CH 5
Fig. 5- 10 Illustration of the mean value theorem.
theorem. This is because it is often used with some limiting argument in
which b —>■ a, so that f -> a also. A formal statement of the theorem is as
follows.
theorem 5-12 (mean value theorem for derivatives) lff(x) is a real valued
function that is continuous in [a, b] and differentiate in (a, b), then there
exists a point f interior to (a, b) such that
f(f>) -f(a)
b — a
=/m
The existence of more than one point f in (a, b) at which this result is
true is not precluded. This is so because it is only asserted that such a point
exists, and not that there is necessarily only one such point. Such is the case,
for example, in Fig. 5T0 since as was remarked, /'(f) =/'(*?) w ^h f ¥= *),
though both points f and v\ are interior to (a, b).
Many people would regard the argument above as proof enough of the
mean value theorem, but for the more critical reader we now offer the
promised proof based on Rolle's theorem.
Proof As with the proofs of many mathematical theorems, our result is
established more easily by a somewhat artificial approach than by a direct
method. Here we shall utilize the intuitively obtained result above to suggest
the form of a special function F(x) to which Rolle's theorem can be applied,
thereby yielding the desired result.
SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 207
Specifically, since by implication the result depends on /(.v) and x, we
shall try to find the simplest function F(x) that depends on f(x) and x,
that is continuous in [a, b] and is differentiable in (a, b), and is such that
F(a) = F(b). The value of F(a) may be assigned arbitrarily and F(x) will
still satisfy Rolle's theorem, so to simplify slightly the working we shall
assume that F(a) = F(b) = 0.
We consider the obvious function
F(x) = A + Bx+f(x)
which clearly satisfies the continuity and differentiability conditions of
Rolle's theorem. The constants A and B must be chosen in order that
F(a) = F(b) = 0.
Thus
= A + Ba +f(a)
and
= A + Bb+f(b)
from which it follows that,
b
Hence F(x) has the form
\ b — a J a —
F(x)=f(x)-f(a) +
(a - x).
Thus we have succeeded in finding a function F(x) with the desired properties
which satisfies Rolle's theorem. Differentiating F(x) we obtain
F'(x) =/'(*)
7(b) -/(a)"
Now by Rolle's theorem there exists a point f, with a < f < b, such that
F'(£) = and so we have our desired result
b — a
Since we may write f = a + d(b — a), where < 6 < 1, this result is
sometimes expressed in the following form attributable to Cauchy,
f(b) -f(a) = (b~ a)f'[a + 6(b - a)] with < 6 < 1.
By applying the same arguments to a suitably constructed function
<p(x), analogous to F(x), it is a simple matter to prove the following extension
of the mean value theorem due to Cauchy. (See Problem 5-37.)
208 / DIFFERENTIATION OF FUNCTIONS CH 5
Corollary 5-12 If g'(x) = h'(x) at all points of [a, b], then g(x) = h(x)
+ constant in [a, b].
Proof Setf = g — hm Theorem 5-12 applied to the interval [a, x]. Then
g(x) — h(x) = g(a) — h(a) = constant and the result follows.
theorem 5-13 (Cauchy extended mean value theorem) If f(x) and g(x)
are real valued functions that are continuous in [a, b] and differentiable in
(a, b) and g'(x) # in (a, b), then there exists a point f interior to {a, b)
such that
f(b)-f(a) _f®
g(b) - g(a) g'it)
5-3 (d) Indeterminate forms — L'Hospital's rule
Limits such as lim (sin ax)/x which apparently tend to the form 0/0 have
already been encountered and given meaning in special cases. A closely
related problem is that of giving meaning to the limit of a quotient which
apparently tends to oo/oo. These limit problems are both called indeterminate
forms. One of the most obvious applications of the extended mean value
theorem is to resolve the value of the limit in either of these situations, and
we now prove the simplest statement of a useful result generally known as
L'Hospital's Rule.
theorem 5-14 (first form of L'Hospital's rule) If f(x) and g(x) are real
valued differentiable functions at x = xo and,
(a) f(x ) = g& = 0,
(b) lim —^ = X, where X is either a real number or infinity,
*^r <?'(*)
thCn r fix) ... fix) .
hm i— = hm J -— = I.
*-*n g(x) x-»x g (x)
Proof Apply the extended mean value theorem to the functions /(x) and
g(x) denned on the interval [x, xo] and use condition (a) to obtain
/(*) ^ fit)
g<*) g'(0
where x < £ < xo.
Now x -»• xo implies that | ->• x , so that by condition (b) we have the
desired result
lim /w lim m. L
x^xogW f-, g (?)
The fact that the variable I appears in the second limit in place of the x
stated in the theorem is unimportant. Its function is simply that of a variable
SEC 5-3
SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 209
and the symbol used to denote it is immaterial.
In general, when the symbol used to denote a variable is unimportant
because it only appears in some intermediate calculation, the details of which
do not concern us, we shall call it a dummy variable.
A useful extension of L'Hospital's rule is contained in the following
corollary which allows examination of limits which tend to the form oo/oo.
Corollary 5-14 If <p(x) and xp(x) are real valued differentiable functions at
x = xo and,
(a) lim <p(x) ->■ ± oo, lim y>(x) -*■ ± oo,
X-»X Q X- ».T
(b) lim = X, where X is either a real number or infinity,
x^x f (X)
then
hm — — = hm — — - = A.
*-**o WW x^x f (X)
Proof Apply the extended mean value theorem to the quotient qs(x)jxp{x) in
the open interval (x, xi) with xo < x < xi, and write the result in the form
<f>(x)
W(x)
1 -
V(x\)
f(x)
1 -
<p(xi)
<p(x)
where x < f < xi. Then, taking xi fixed and arbitrarily close to xo so that
£ -»• xo, allow x ->■ xo. The first factor on the right-hand side then approaches
arbitrarily close to unity thereby giving rise to the stated result. A modifica-
tion of this argument shows that the result is also true if xo ->■ oo.
Example 5-9 Determine the value of the following indeterminate forms
using L'Hospital's rule and Corollary 5T4:
sin ax
(a) lim
Z-.-0 X
... ,. x* + 3x2 - 2x - 2
(b) lim — ;
x ^i 2x 2 — x — 1
, . ,. sin 3x
(c) hm — — ;
x^O X 3
tan 3x
(d) lim
X ->1„ tanx
210 / DIFFERENTIATION OF FUNCTIONS CH 5
(e) lim
P
-I) cot bx
Solution (a) This is of the form lim//g— >-0/0 with /(.v) = sin x.v and
g(x) = .y. As/'(.v) = a cos a.v and g'(x) = 1 it follows that
sin a.v . a cos a.v
lim = lim = a.
:r— x J---0 1
This confirms the limit that was obtained by a different method in Chapter 3.
(b) This is also of the form \im f/g— ► 0/0 but this time with f(x) = x 3
+ 3.Y 2 - 2.v - 2 and g(x) = 2.y 2 - .v - 1. Tt follows that f'(x) = 3x 2
+ 6x — 2 and g '(.y) = 4.y — 1 so that
.y 3 + 3.y 2 - 2.y - 2 ,. 3.y 2 + 6.y - 2 7
lim = hm = —
,_., 2.Y 2 - .Y - 1 ,_., 4.Y - 1 3
(c) This is again of the form lim//g— >-0/0 with /(.y) = sin 3.y and
g(x) = .y 3 . Hence /'(a) = 3 cos 3.y and g'{x) = 3.y 2 so that
sin 3.y cos 3.y
lim — = lim >- + cc.
3-^0 x r _.o x-
(d) This is of the form lim//g— >- oo/oo with f(x) — tan 3.y and ^(.v)
= tan .y. Hence f'(x) = 3 sec 2 3.y and g'(x) = sec 2 .y and by Corollary 5T4,
tan 3.y 3 sec 2 3.y cos 2 .v
lim = lim = 3 lim
tan x . r „w sec 2 .v . r -*j n cos 2 3.y
This is again an indeterminate form, but now of the type 0/0. Applying
Theorem 5T4 we have
cos 2 .y .. 2 sin .y cos .y .. / sin x \ .. t cos .y \
cos-.y z sin .v cos x ,. / sin x \ ,. /
3 lim = 3 lim — : — = lim .lim
t-»jjt cos 2 3y r -.\i7 6 sin 3.y cos 3.y r —\n vsin 3.v/ ,• .<* \
^COS 3.Y/
and hence
tan 3.y , . cos .y
lim = — hm — — — •
*_.}„ tan .y . r -s„ cos 3.y
This last result is yet again an indeterminate form of the type 0/0 so that a
further application of Theorem 5- 14 finally gives
tan 3.y ,. sin x 1
hm = - lim . = -•
. T — s „ tan .y .r->j^ 3 sin 3.y 3
(e) This is of the form lim f/g — > oc/oc but it is easily seen that an applica-
tion of Corollary 514 will not simplify the limit to be evaluated. Instead, we
SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 211
rewrite the limit in the form
a
, .y / tan bx
lim — = lima
; ,.-.o cot bx .,-.(> .y
when it is seen that the alternative form is of the type lim//g— >-0/0 with
f(x) = a tan bx and g(x) = x. Now/' (a) = ab sec 2 bx and g'(x) = 1 so that
by Theorem 514,
;)
,.. ab sec- x
lim — = hm = ab.
j--.o cot bx T ^ 1
5-3 (e) Identification of extrema
We return to the topic of extrema and, in particular, to the identification of
functional behaviour at stationary values by means of the mean value
theorem.
Suppose that a real valued function f(x) is differentiable in the interval
(a, b) and has a maximum at an interior point xo of (a, b).
Then if h is assumed to be positive and we consider the interval
[xo — h, xo] to the left of xo, by the mean value theorem
/(a-o)-/(a-o-/Q
h =/(a
where xo — h < £ < xq.
Now by supposition h > and as xo is a maximum, the numerator of
this expression will also be positive showing that/'(f) > 0. Hence by allowing
h to tend to zero, it follows that f -*■ xo and we have shown that to the immedi-
ate left of the maximum we must have/' > 0.
To the right of the maximum, and in the interval [.v , xo + h], the same
argument shows that
where ao < r\ < xo + h. This numerator is negative so that to the immediate
right of the maximum we must have/' < 0.
Similar arguments applied to a minimum and a point of inflection with
a horizontal tangent yield the following useful theorem, illustrated in Fig. 51 1 .
theorem 515 (identification of extrema using first derivative) If/(x) is a
real valued differentiable function in the neighbourhood of a point A'o at
which /'(.Yo) = then:
(a) the function has a maximum at ao \ff'(x) > to the left of ao and
212 / DIFFERENTIATION OF FUNCTIONS
CH 5
f'<0 f>0
(c)
Fig. 5-11 Stationary values of y = fix): (a) local maximum; (b) local minimum;
(c) point of inflection with zero gradient.
f'(.x) < to the right of ,v ;
(b) the function has a minimum at ,vo if /'(- Y ) < to the left of .\ and
f'(x) > to the right of x ;
(c) the function has a point of inflection with zero gradient at .yo if
f'(x) has the same sign to the left and right of xo.
In many books these results are regarded as intuitively obvious deductions
from the geometrical interpretation of a derivative in conjunction with the
behaviour of the graph of the function. However we have discussed them
formally here as an illustration of an important consequence of the mean
value theorem.
SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 213
Example 510 We again consider the functions of Example 5-8.
Case (a) f(x) = |.r 3 + 2.v 2 + 3x + 1 with stationary points x = f at
I = — 1 and f = — 3. Ksf'(x) = x 2 + 4.v + 3 it follows that to the immedi-
ate left of | = — 1 we have /' < 0, whilst to the immediate right /' >
showing that f = — 1 corresponds to a minimum. A similar argument shows
that f = — 3 corresponds to a maximum.
Case (b) /(.v) = (2.v — 5).v 2/3 with the one stationary point x = f at
1=1. As f'(x) = 2.y 2/3 + 2(2* - 5)/3x 1/3 it follows that /' < to the
immediate left of f = 1 and /' > to the immediate right. Hence f = 1
corresponds to a minimum. As Theorem 515 stands, since/is not differenti-
able at the origin, the maximum that occurs there must be identified as in
Example 5-8. However a trivial modification of the proof would show that
results (a) and (b) of the theorem are still valid if/ is not differentiable at .yo.
5 3 (f) Differentials
In using the notation dyjdx to represent the derivative of the dependent
variable y with respect to x we have thus far been careful to emphasize that
dj/d.v is simply a number defined by a limit. Although suggestive of incre-
ments, dy and d.x taken separately have as yet no individual meaning. In
many applications, particularly in differential equations which we encounter
later, it is convenient to work with actual quantities dy and dx which we will
call differentials.
However differentials must obviously be defined in a manner consistent
with the notation dyjdx when it is used to denote the derivative with respect
to x of the function y defined by
y =/(*)• (5-8)
We achieve this by defining dy, the first-order differential of;-, by
dj=/'(-v).A.Y, (5-9)
where A.y is an increment in a' of arbitrary size.
However, if, for the moment, we regard the independent variable .y as a
function of .y we can write x = g(x) with g(x) = x. Then by the above
argument d.Y, the first-order differential of x, is defined by
dx = 1 . Ax, (5-10)
showing that we may with meaning write Eqn (5-9) in the form
dv=/'(Y)d.Y. (5-11)
When needed, the actual increment in y consequent upon an increment
A.y in x will be denoted by Ay. In general the differential dj and the increment
Ay are distinct quantities and the interrelationship between them is indicated
214 / DIFFERENTIATION OF FUNCTIONS
CH 5
Fig. 5- 12 Differentials dx and Ay.
in Fig. 5-12.
In more advanced treatments the use of differentials is strictly avoided on
account of logical difficulties encountered with their definition. However
they are so useful that we shall ignore these objections and use them freely
whenever necessary.
It is an immediate consequence of this that if
y = kif(x) + k 2 g(x)
then by Theorem 5-4,
dy = kif'(x)dx + k 2 g'(x)dx
or, equivalently, in symbolic notation
d(kif + k 2 g) = krff + k 2 dg. (5-12)
If we have
y =f(x)g(x)
then by Theorem 5-5,
ty = g(x)f(x)dx +f(x)g'(x)dx
or, equivalently, in symbolic notation
d(fgy = gdf + fdg. (5-13)
Finally, if
y=f(x)lg(x)
then by Theorem 5-8,
g(x)f'(x)dx -f(x)gXx)dx
dy =
g 2 (x)
SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 215
or, equivalently, in symbolic notation
Jf\ = gdf-fdg (5 . 14)
'I ) "
Example 5-11 ]f/(x) = sin (x 2 + 4) and g(x) = x 3 find the differentials:
(a)d(3/+£);
(b) d(fg);
(c)d
©•
Solution
(a) d(3/+ g) = d[3 sin O 2 + 4) + x*]
= 3 cos (jc 2 + 4)d(;c 2 + 4) + 3x 2 dx
= 6x cos (a- 2 + 4)dx + 3a- 2 c1.y.
(b) d(fg) = d[x* sin (x* + 4)]
= 3x 2 sin (x 2 + 4)dx + x s cos (.y 2 + 4)d(.v 2 + 4)
= 3x 2 sin (jc 2 + 4)dx + 2x i cos (.y 2 + 4)d.v.
'©"
"sin (jc 2 + 4)"
x 3 cos O 2 + 4)d(jc 2 + 4) - 3.y 2 sin (.y 2 + 4)d.v
x^
2x 2 cos (,y 2 + 4)d.Y - 3 sin (.y 2 + 4)d.v
For small values of d.Y, the differential dv is obviously a reasonable
approximation to the actual increment Ar. This simple observation is often
utilized to relate small changes in dependent and independent variables as
the next example shows.
Example 512 The pressure/; of a polytropic gas is related to the density p
by the expression
P = A P r >
where A is a constant. Deduce the relationship connecting the differentials
dp and dp. Given that y = 3/2 and p = 4, and taking dp as an approximation
to the actual pressure change A/?, compute the approximate new pressure if
p is increased by 01. Compare the approximate and exact results.
216 / DIFFERENTIATION OF FUNCTIONS CH 5
Solution In this case p =f(p) with f(p) = Ap y . Hence f'(p) — yAp'~ l and
thus the desired differential relation is
dp = yAp'~ l dp.
When y = 3/2 and p = 4 it follows from the stated pressure-density
law that the initial pressure po is
p = ifii^A = 8/(.
Using the differential relation to compute the approximate pressure increase
represented by the differential dp we find
dp = QI2).A.W*.(0-l) = 0-3A.
Hence the approximate new pressure po + dp = 8-3/1.
The exact new pressure po + A/? may be computed from the pressure-
density law by setting p = 4-1 to obtain
po + &p = (4-1)3/2,4 = 8-308/4.
This shows that in this case the differential relation gives a good approxima-
tion to the pressure increase.
5 -4 Higher derivatives — applications
We have seen how differentiation applied to a suitable function/(,v) yields as
a result another function /'(.v), the derivative of f(x) with respect to x. If
the function f'(x) is itself differentiable then a repetition of differentiation
will result in a further function that we shall denote by/"0) and will call the
second derivative of f{x) with respect to x. We may usefully employ the
dynamical problem that served to introduce the notion of a derivative to
give meaning to the notion of a second derivative, for if fix) represents a
velocity, then f"{x) represents an acceleration. If the function f'{x) is
itself differentiable then it is customary to denote the third derivative off(x)
by fix) after which, if necessary, further derivatives are conventionally
denoted by the use of superscript roman numerals. Hence the sixth derivative
of a suitably differentiable function /(x) would be written / vi (x).
A better notation than this is needed for general purposes and the two
most often used because of their versatility are
d'H-
-^ or D"y.
d.v"
These both represent the «th derivative with respect to x of y = f(x) and
for their determination require the successive application of differentiation
n times. The number n is the order of the derivative and the symbol D
symbolizes the operation of differentiation. Computationally the definition
of the- «th derivative of y with respect to x is equivalent to using either of
these two equivalent algorithms
SEC 5-4 HIGHER DERIVATIVES -APPLICATIONS / 217
d_
dx
— -4 ) = T I or D[D»-iy) = D»y. (5-15)
These expressions are, of course, only meaningful when n is an integer
and we shall agree to the convention D°y = y.
Geometrically, the function d n y\dx n bears to the graph of d n_1 j/dx"" 1 ,
the same relationship as does the function dy\dx to the graph of y. Namely
d n y\dx n at x = xo is the gradient of the graph of d^-^/dx" -1 as a function
of x at the same point x = xo-
Example 5-13 Determine dy/dx, d 2 yldx 2 , and d 3 y/dx 3 given that y = f(x)
with:
(a) f(x) = cos mx;
(b) f(x) = tan x;
(c) f(x) = 1/(1 + x).
If possible make deductions about the nth derivative.
Solution
dy d
(a) — = f'(x) — — (cos mx) = — m sin mx,
dx dx
d*y d (dy\ d .
-—z = — — I — — I = — — [— m sin mx] = — m l cos mx,
dx 2 dx \dx/ dx
d /d 2 v\ d
= — • I — - 1 = -— [— m 2 cos mx] = m 3 sin mx.
dx \dx 2 / dx
d3y
dx~ 3
An inductive argument easily shows that the nth derivative (d n /dx")(cos mx)
= m n cos [mx + (rmjl)].
In respect of the function y = cos mx, it is of importance to notice that
the simple algebraic equation
d 2 y
connects the function and its second derivative. Because this equation
involves derivatives it is a differential equation. Such equations are very
important in both mathematics and the mathematical sciences ; the last three
chapters of this book provide an introductory study of them.
dv d
(b) -^- ^f'(x) = — (tan x) = sec 2 x,
dx dx
d2y
dx 2
d (dy\ d
= — I — = — (sec 2 x) 2= sec 2 x tan x,
dx \dx/ dx
218 / DIFFERENTIATION OF FUNCTIONS CH 5
d 3 v d /d 2 v\ d „
d^ = ±x idi j = dx (2 Se ° 2 X tan X) = 2 S6C2 X(2 ta " 2 X + sec2 x) -
There is no simple rule by which (d"/dx n )(tan x) may be computed.
*? = n X ) = l /'-J— "i - - 1
dx JK ' dx\l+x/ ""(1 + *) 2 '
d2 y ^ A (fy\ _ d r -1
dx \dxf dx
(c)
dx 2
d
dx
It follows by induction that
d» / 1 \ (-!)»«!
d*y _ d /d 2 j>\ _
dx 3 dx \dx 2 /
(1 + x)\
2
(1 + Jt)3
-3!
(1 + x)=>J (1 + *)*
K n \ 1 + X/
dx" \1 + x/ (1 + x)" +1
In general, functions are not capable of differentiation an indefinite
number of times, and at some stage they usually become non-differentiable.
A simple example of a function that is not differentiable an indefinite number
of times, though for a different reason from the above, is x n , with n an integer.
The «th derivative of x n is the constant number n\ so that the (« + l)th and
all subsequent derivatives are identically zero.
5-4 (a) Leibnitz's theorem
This useful theorem is a consequence of Theorem 5-5 and facilitates the com-
putation of high-order derivatives of the product f(x)g(x) of the two func-
tions /(» and g(x), in terms of the derivatives of the individual functions
f(x) and g(x) themselves.
The result is, perhaps, best expressed in terms of the symbolic differentia-
tion operator D, and for our starting point we now re-express the result of
Theorem 5-5 in terms of the operator D.
D(fg)=fDg+gDf.
Assuming functions/(x) and g(x) are suitably differentiable, a further applica-
tion of the operator D together with Theorem 5-5 yields
DKfg) = D(fDg + gDf)
= Df. Dg + fD*g +Dg.Df+ g D*f.
However
J s dx dx dxdx * J '
SEC 5-4 HIGHER DERIVATIVES - APPLICATIONS / 219
so that
DHfg)=fD*g + 2Df.Dg + gD2f. (5-16)
A repetition of the same argument shows that
DKfg) =fD 3 g + 3Df. D*g + 3Dy. Dg + gB*f. (5-17)
The coefficients involved in Eqns (5-16) and (517) are seen to belong to
the general pattern of binomial coefficients in the expansion of (a + b) n ,
namely to the rows of numbers
(°o)
(!)
(« = 0)
(«=i)
*■-» (o) (?) (?)
(« = 3)
Q (?) (?) (?)
or, equivalently, to the rows
(» = 0) 1
(« = 1) 1 1
(« = 2) 1 2 1
(« = 3) 1 3 3
This suggests that in evaluating D n (fg), the coefficients arising should
belong to the (n + l)th row of either of these arrays, which are Pascal
triangles. That this is so can be proved fairly easily, using an inductive argu-
ment similar to that used to prove the binomial theorem. We shall not give
the details, preferring simply to state the theorem.
theorem 5' 16 (Leibnitz's theorem) If/(x) andg(x) are n times differentiable
real valued functions in the interval {a, b), then
D'Kfg) = I (l) D»~*f. D*g.
The value and power of this is best shown by an application.
Example 5-14 Use Leibnitz's theorem to evaluate (d 3 /dje 3 )(x 6 sin x).
220 / DIFFERENTIATION OF FUNCTIONS CH 5
Solution Setting n = 3 in the general result gives
DKfg) = gDJ+ 3D*f. Dg + 3Df. D*g + fD»g.
This is, of course, result (517) differently expressed. Now we make the
identifications /(.y) = .y 6 and g(x) = sin .v when it follows that Df = 6.Y 5 ,
D 2 f= 30.v 4 , Z) 3 / = 120.Y 3 , and Dg = cos x, D 2 g = - sin x, D 3 g = - cos x.
Hence substitution into the above result gives
Z) 3 (.v 6 sin .v) = 120x 3 sin x + 90.y 4 cos x — 18x 5 sin x — x 6 cos .y.
5-4 (b) Identification of extrema by second derivatives
An important application of the second derivative of a function /(.y) is to the
identification of the nature of its extrema. Let us suppose that/(.Y) is twice
differentiable and that/'(xo) = and/"Oo) = L < 0.
Then from Definition 5-2 and the notion of a second derivative we must
have that
r(xo) = lim / ' ( " ) -^° ) = Z,<0.
By supposition f'(x ) = 0, so that
f'(x)
f"(xo) = lim -J-L2- = L < 0.
X-+I0 X Xo
This limit must be independent of the manner in which .y approaches .Yo
so that we must consider separately the cases that x lies to the left or to the
right of Xo.
If x lies to the left of xo then x — x < 0. Consequently, as the value L
of the limit is negative, the expression defining f"{xo) implies that to the
immediate left of xo it must be true that/'(.Y) > 0.
If x lies to the right of xo then x — xo > 0. Consequently, as the value L
of the limit is negative, the expression defining f"(xo) implies that to the
immediate right of xo it must be true that/'(.Y) < 0.
These results, in conjunction with Theorem 5-15 (a) prove that at a
stationary value xo, for which/"(vo) < 0, the function/(.Y) attains a maximum
value. An exactly similar argument proves that at a stationary value .yo, for
which /"(-Yo) > 0, the function f(x) attains a minimum value.
To complete the argument, consider the situation in which f"(xo) = 0.
It might be conjectured that this corresponds to a point of inflection; and to
establish the correctness of our intuition let us appeal to the geometrical
interpretation of a derivative as a gradient.
Suppose that .yo corresponds to a point of inflection with zero gradient.
Then as .y increases through the value „vo, either
(a) f'(x) is initially positive and decreases to a minimum value/'(.Yo) = 0,
thereafter increasing again (cf. Fig. 51 1 (c));
SEC 5-4 HIGHER DERIVATIVES -APPLICATIONS / 221
or,
(b) f'(.x) is initially negative and increases to a maximum value/'(-Yo) = 0,
thereafter decreasing again.
In each case .\o is a stationary value of the first derivative/'(.\), so that by
an application of Theorem 5- 10 to the function /'(.v) we find that/"(.\o) =
at a point of inflection.
We have thus proved the following theorem.
theorem 517 (identification of extrema using second derivatives) Let
/(.y) be a real valued twice differentiate function in (a, b) with a stationary
point xo in (a, b), so that/'(.vo) = 0. Then, if
(a) f"(x ) < the function /(x) has a maximum at .vo,
(b) /"(vo) > the function /(.v) has a minimum at .vo,
(c) f"(xo) = the function f(x) has a point of inflection at .\o with zero
gradient provided that the sign of/'(.v) is the same to the immediate
left and right of xq.
The proof of this theorem shows clearly what was asserted earlier; namely
that a point of inflection on the graph of a function separates a region of
convexity from a region of concavity. There is, of course, no necessity that
this point should have associated with it a zero gradient.
Following this argument to its logical conclusion we see that the proof of
(c) above need only involve the sign off'(x) t0 tne left and right of .v when
/'(-Xo) = 0, for then such arguments are needed to distinguish between an
extremum and a point of inflection. If/'(.Yo) ^ such problems do not arise
and it is sufficient to look for those values f for which /"(f) = 0. We have
thus proved the following general result.
theorem 5- 18 (location of points of inflection) If/(x) is a real valued
twice differentiable function then its points of inflection, if any, occur at the
numbers f for which /"(I) = provided that /'(£ ) ^ 0. Tf however this is
not so, and /'(f) = 0, then f corresponds to a point of inflection provided
that the sign of/'(x) is the same to the immediate left and right of f .
It is left to the reader as an exercise to prove that when/'(.*o) = f"(xo) = 0,
then provided /'"(xo) exists, our condition onf'(x) may be replaced by the
requirement f'"(xo) =£ 0. The proof is essentially similar to that given for
Theorem 5T7 though this time the starting point is the definition o[f"(xo)
expressed as a limit. We give this result as a corollary.
Corollary 5-18 Tf f(x) is a real valued thrice differentiable function and
/'(f) =/"(£) = 0, then/(X) has a point of inflection at x = f if/'"(f) =£ 0.
Example 5-15 Locate and identify the stationary values of the following
222 / DIFFERENTIATION OF FUNCTIONS CH 5
functions. Find any points of inflection they may have, together with the
gradient of the tangent line at such points:
(a) f(.\) = .v 3 - 12.v + 1 in [- 10, 10] ;
(b) f(x) = tan x in [-fr, H;
(c) /(.v) = (.v - 1)3 in (-oo, oc).
Solution (a) The stationary values are those numbers I for which/'(£) = 0.
Hence as f'(x) = 3.v 2 — 12, the stationary values are determined by the
equation
3f 2 -12 = 0.
This has roots f = 2, f = —2 which both lie in [—10, 10] and are the desired
stationary values. As/"(.v) = 6.v, it follows that/"(2) = 12 > and/"(-2)
= — 12 < 0. Hence by Theorem 5-17, the point | = 2 is a minimum and the
point | = —2 is a maximum. Since the function has no other stationary
value there can be no point of inflection at which the tangent line has zero
gradient. However f"(x) = 6.y vanishes when x — 0, so that by Theorem
5-18 we see that .v = must correspond to a point of inflection. The gradient
at .y = is/'(0) = — 12 which is the gradient of the desired tangent line to
the graph at the point of inflection.
(b) Here we have/'(Y) = sec 2 .v and clearly, since sec 2 .v = 1 + tan 2 x,
it follows that/'(;v) ^ in [ — \tt, \tt\. The function /(*) = tan .y thus has no
stationary values in [— J77, \v], though it assumes its greatest value at 477
and its least value at — \tt. We have/"(.Y) = 2 sec 2 x tan x which vanishes
for.v = 0. Hence by Theorem 5-18, the function tan .y has a point of inflection
at the origin at which the gradient of the tangent to the graph has the value
/'(0)=1.
(c) We see that/'(.v) = 3(.y — l) 2 and so the condition /'(I) = yields
£ = 1 as the single stationary value. However, f"(x) = 6(.y — 1) which shows
that we also have/"(l) = 0. Appealing to the last part of Theorem 518 we
see that, as f'(x) = 3(.y — l) 2 > to both the left and right of .y = 1, it
follows that/(.v) = (.y — l) 3 has a point of inflection at that point. The tangent
line to the graph there has a zero gradient. Alternatively, as/'"(.\)~ 6^0,
the result also follows from Corollary 5-18.
5-5 Partial differentiation
The notion of continuity has already been extended so that it is meaningful
in the context of functions of several independent variables. It is now appro-
priate to extend the notion of a derivative in a similar fashion. For simplicity
of argument we shall work with the function f(x, v) of two independent
variables, and in order to visualize its behaviour geometrically we will define
a dependent variable by the equation
u=f(x,y). (5-18)
SEC 5-5
PARTIAL DIFFERENTIATION / 223
The function may then be represented as a surface in three dimensional
space.
A typical surface generated by a function of the form of Eqn (5-18) is
shown in Fig. 5-13 and, unlike functions of one independent variable, it is
necessary to define more than one first-order derivative. The idea involved is
simple: by holding one of the independent variables in /constant at some
value of interest, the function/then becomes a function of the single remain-
ing independent variable. We may then differentiate / as though it were a
function only of that one variable. By holding first x and then y constant in
this manner, two different derivatives may be defined which, because of their
manner of computation, will be called partial derivatives to distinguish them
from our earlier use of the term derivative. We shall now express these ideas
formally as a definition and set down the standard notation to be used.
Fig. 5- 13 Geometrical interpretation of partial derivatives.
definition 5'5 (partial derivatives) Let f(x,y) be a function defined near
(xo, Vo). Suppose that
Jim
X—Xq
f(x,yo) -f(x ,yo)
x — Xo
(A)
exists and is independent of the direction of approach of x to x . Then /is
differentiable partially with respect to x at (.vo, J'o). The value of the limit is
224 / DIFFERENTIATION OF FUNCTIONS CH 5
denoted by f x (x ,yo) or by Sfl8x\ (rolm) and called the first-order partial
derivative of/ with respect to x at (xq, yo).
Similarly, suppose that
lim /fa^)-/(*o..yo)
v-vo y -jo v '
exists and is independent of the direction of approach of y to yo. Then / is
differentiate partially with respect to y at (x , y ). The limit is denoted by
fy(xo,}'o) or by 8fj8y\ {XoM) and called the first-order partial derivative of/
with respect to y at (xo, yo).
By analogy with ordinary derivatives, if/(.v, y) is differentiable partially
with respect to x and r at all points of some region in the (x, j)-plane and
these derivatives are continuous, then we say/is differentiable in that region.
The operations of partial differentiation with respect to x and y are usually
denoted by the differentiation operators 8/8x and 8/8y, respectively.
Let us now interpret these definitions in terms of Fig. 5-13. The function
f(x, yo) occurring in the numerator of limit (A) in Definition 5-5 is represented
in that figure by the intersection of the surface u =f(x,y) with the plane
y — yo which has been labelled III. It is the curve L\. The number f x (x ,yo)
defined by limit (A) is the gradient of the tangent line h to this curve at point
P. By requiring the limit to be independent of the direction of approach of x
to Xo, we have ensured that the tangent lines drawn to the curve at P, whether
from the left or the right, will have the same gradient. In simpler terms this
ensures that the curve L\ is smooth and has no kink at P.
The number /(xo, y) occurring in the numerator of limit (B) in the defini-
tion is represented in Fig. 5T3 by the intersection of the surface u = f(x,y)
with the plane x = xo which has been labelled n 2 . It is the curve Lt. The
number f y (x ,yo) defined by limit (B) is the gradient of the tangent line h
to this curve at point P.
Thus by differentiating partially we mean that, during the process of
differentiation, the other independent variable is to be regarded as a constant.
In consequence, all the rules of differentiation developed for functions of a
single variable are also rules of partial differentiation, provided only that the
functions involved are suitably differentiable. On account of this when, for
example, the operator 8/8x acts on a function only of y, say g(y), that function
is to be regarded as a constant with respect to this operator and so
(8l8x)[g(y)] = 0. Similarly (8l8y)[h(x)} = 0.
Example 5-16 In each of the following cases compute/: and/, as functions of
SEC 5-5 PARTIAL DIFFERENTIATION / 225
.y and y. Use the result to determine the numerical value of these derivatives
at the stated points:
(a)/(.Y,F) = * 3 + 2.Yj + 2r 2 ; (1,2);
(b) f(x, y) = x sin xy + 3 ; (1,^);
(c)f(x,y) = X Kx*+y*); (1,0).
Solution
(a) /* = £ W + 2xy + 2f]
= ^-[x*]+2y^[x] + 2yZ^-[l),
ex ox ox
whence
8f
ox = 3x2 + ^
At the point (1, 2) we find that 8f/8x\ a :2) = 7. Similarly,
fy = j [* 3 + 2xy + 2y*\
= x 3^-[l]+2x^{y] +2^[j2]
8y By ' 8y y
whence
= 2x + 4y.
ay
At the point (1, 2) we find that d/]8y\ (lt . 2) = 10.
8
0) fx = — [x sin xy + 3]
a as
= x— [sin xj] + sin xy— [x] + — [3]
8x ex ox
whence
3/
— = xy cos xy + sin xy.
226 / DIFFERENTIATION OF FUNCTIONS
CH 5
At the point (1, \n) we find that c//8x| (1>) = 1. Similarly,
fy = ^-[x sin xy + 3]
= x — [sin xy] + — [3]
whence
8v
= x 2 cos xy
and
¥
^
= 0.
(i»
(c)/*
oX
_x 2 + y 2
1 8 3
x* + _y* ox 3x
1
x 2 + y 2 (x 2 + j 2 ) 2 dx
[x 2 +y 2 ],
1
2x 2
y 2 — x 2
whence
dx x 2 + j 2 (x 2 + y 2 ) 2 (x 2 + j 2 ) 2
At the point (1, 0) we find that 8f/8x\ (lfi) = -1. Similarly,
x
fv — ~r
dy
x 2 + _y 2
= x-[(x 2 +J 2 )" 1 ]
-x 3
7-[* 2 + ;> 2 ],
(x 2 + >' 2 ) 2 8j
whence
8/ -2xy
dy (x 2 + J 2 ) 2
and so
0/
0V
= 0.
n.o)
SEC 5-5 PARTIAL DIFFERENTIATION / 227
The notion of partial differentiation extends to functions of more than two
independent variables in an obvious manner. Suppose that the function
f(x, y, z) is defined near the point (xo, yo, zo) then, provided the limits exist,
we define the three first-order partial derivatives/*,/^ and/ z by the expressions
8 1
8x
8 1
dy
8 1
8z
, ■ fix, yo, zo) - f(x , yo, Zo)
= urn '
(xo,yo,zo) %-*xo
X — Xq
, • /(xo, y, zo) - /(xo, yo, zo)
= lim ,
(zo.i/o,zo) y^vo y~ y°
, • fixo, yo, z) - f(x , yo, zo)
= lim
{xo,vo,zo) z ~* z o Z — Zo
Clearly a function of n independent variables will have n different first-
order partial derivatives ; one with respect to each of the independent variables.
The actual computation of these partial derivatives is carried out exactly as
before.
Example 5-17 Find the first-order partial derivatives of
f{x, y, z) = x 3 y 2 + 3 sin yz + 2.
Solution This function has three independent variables so we must obtain
three first-order partial derivatives. Namely,/;,/,, and/ z . First we have
'J- = — [ X 3y2 + 3 s in y Z + 2]
8x ox
= j2 ^ [x 3 ]+3 sin 7 zl[l] + £[2],
so
»/
= 3x 2 v 2 .
8x *
Next,
f- = \- [x 3 y 2 + 3 sin yz + 2]
= ^ 3 | [ ^ + 4 [SinjZ]+ ^ [2] '
so
8f
— = 2x 3 y + 3z cos yz.
228 / DIFFERENTIATION OF FUNCTIONS CH 5
Finally,
8f 8
f z = y z lxY + 3 sin yz + 2]
= X 3 y 2 m + 3 [sinjz] + -[2],
dz oz oz
so
-=3ycosyz.
5-6 Total differential
The idea of a differential, that was useful in ordinary differentiation, may
also be developed to advantage in connection with partial differentiation.
We first approach this problem from the geometrical standpoint, and then
indicate how an analytical counterpart of these arguments can be produced.
Let us consider Eqn (5-18) and its geometrical representation in Fig. 513.
The conditions for differentiability at P ensure that the surface has a tangent
plane II at that point (why?), and it is to this plane that we now confine
our attention. An element of this tangent plane defined by the lines h and h
through P is depicted in Fig. 5-14. Obviously points on II close to P must also
be close to those points on the surface u = f(x, y) that lie vertically below
them. This suggests that for such points, the element of plane IT neighbouring
P represents a good approximation to the element of the curved surface
defining the function u near to P. Thus variations of u close to P may, with
propriety, be approximated by the variations of the corresponding points on n .
Since we are interested in variations of u about the point P at which
u = /(xo, 70), we shall start by translating our coordinate axes without
rotation to the point P. In this position the new x, y, and u coordinate axes
will be denoted by x', y', and u', respectively, as shown in Fig. 5-15.
If, relative to P, the x' and y' coordinates of a point P' are Ax and Ay,
then it is obvious from Fig. 5-15 that the increment dw must be
dw = Ax tan a + Ay tan ft,
where a and /? are the angles between the lines h and h and the x'- and j'-axes,
respectively.
However, by the definition 0$ f x and/j,, we have
fx(xo, yo) = tan a, f y (x , y ) = tan p,
so that
dw = fx(x , yo)Ax + f y (x , yo)Ay. (5-19)
We now define differentials dx and dy in the independent variables x and y
SEC 5-6
TOTAL DIFFERENTIATION / 229
Fig. 5.14 Tangent plane II to surface u = f(x, y) at point P.
by setting dx = Ax and dy = Ay. Expression (5-19) then becomes
dw = f x (xo, y )dx + f y (x , y )dy,
(5-20)
which is the relationship by which we define the total differential du of the
function u =f(x,y). This is so called because it takes account of the total
effect, on u, of the changes dx in x and dy in y. The additive effect of these
changes is clearly apparent in Fig. 5-15 and results from using a tangent plane
approximation to the surface near P. As before, when dx and dy are suitably
small, du is a reasonable approximation to the true change Am given by
Am =/(x + dx, jo + dy) —f(x ,yo). (5-21)
An analytic rather than geometric justification of the tangent plane
approximation used to define du in Eqn (5-20) can be based on Theorem 5-12.
Equation (5-21), which is exact, is taken to be the starting point and by
addition and subtraction of a term/(xo, yo + Ay), is written
Aw = [/(x + Ax, y + Ay) -f(x ,yo + Ay)]
+ lf(xo,yo + Ay) -f(x ,yo)],
where the first bracket is a function only of x and the second bracket is a
function only of y.
Then Theorem 5-12 expressed in the Cauchy form may be applied to the
first bracket with respect to x and to the second bracket with respect to y to
yield
230 / DIFFERENTIATION OF FUNCTIONS
CH 5
dxtana+Jytanfi
Fig. 5.15 Element of tangent plane.
Am = Axfxixo + f Ax, y + Ay) + Ayf v (x , yo + J? A_y), (5-22)
where < | < 1 and < r\ < 1. Partial derivatives have been used here
because, although in the first bracket it is only x that varies whilst in the second
bracket it is only y that varies, both brackets are nevertheless functions of
x and y.
Result (5-20) then follows by letting Ax and Ay become small. The
continuity of f x (xo + f Ax, yo + Ay) allows it to be approximated by
fx(xo, yo) with an error ei and, similarly, the continuity of fy(xo, yo + f] Ay)
allows it to be approximated by f y (xo,yo) with an error £2. Then, as Ax,
Ay —*■ 0, so also do ei and £2. It is left as an exercise for the reader to supply
the details necessary to make this argument rigorous. If Eqn (5-20) is defined
for all points (x , yo) of some region in the (x, j)-plane, theh the suffix zero
may be discarded and Eqn (5-20) can then be regarded as a functional rela-
tionship rather than a result that is true only near one point.
We have thus proved a special case of the following more general result
whose proof differs in no significant detail.
theorem 5-19 (total differential) Let/(xi, x 2 , . . ., x n ) be a real valued
function of n real variables and let its first-order partial derivatives exist and
be continuous in some region £%. Then the total differential du of the function
u —f{xi, X2, ■ ■ ■, x n ) in the region £% is given by
8f 8f 8f
du = -i- dxi + -^- dx 2 + • • • + tt~ dx n .
OXi 8X2 OXn
SEC 5-6 TOTAL DIFFERENTIATION / 231
If we consider the surface generated by setting u = constant, then on that
surface du = 0. Theorem 519 then takes the form
df df df
= -f- dxi + -f- dx 2 + • • - + 7T- dx n , (5-23)
OX i 0X2 OX n
showing that the differentials dx\, dx2, ■ . ., dx„ are no longer independent
since this constraint condition has been imposed on them. This is of course
to be expected, since we have imposed the single condition /(jci, X2, . ■ ., x n )
= constant on the independent variables u\, U2, . . ., u n so that we are no
longer free to change them arbitrarily. Indeed, if differentials d*i, d^2, . . .,
dx„-i are chosen arbitrarily, then the remaining differential dx n is uniquely
determined by Eqn (5-23). If we call the number of independent variables the
number of degrees of freedom associated with the equation u =/(xi, X2, . . ., x n ),
then Eqn (5-23) implies the loss of a single degree of freedom.
Example 5-18 In thermodynamics, the pressure p of an ideal gas, its volume
V, its absolute temperature T and the gas constant R are related by the ideal
gas law pV = RT. Find the expression relating the total differential dp and
the differentials dKand dT.
Solution We have p = RT/V, and so p =f(T, V) with f(T, V) = RT/V.
Hence Bf/dT = R/V and dfldV = -RT/V 2 . Now interpreting Theorem 5-19
in this case we find
d H19 dr+ (^) dF ' ( * }
and so
Notice that the use of the symbol /in the total differential relation (*) to
bring it into accord with the notation of Theorem 5-19 is not strictly necessary
since p =/. We could equally well have written equation (*) as
HIW(I)-.
and used the immediately obvious result that
8p _ R dp _ RT
8T~ V an d~V~ ~ ~V~ 2 '
Let us now consider the function u = f(x, y) and, as a special case, set
u = so that the equation
232 / DIFFERENTIATION OF FUNCTIONS CH 5
defines y implicitly in terms of x. How then may we compute the derivative
dy/dx without solving for y in terms of x? The solution to this problem is
provided by Eqn (5-23), which in this case takes the form
= ^dx + f-dj.
ox cy
We saw in connection with the definition of the differentials dy and dx in
Eqn (5-11), that the function (dy/dx), called the derivative of y with respect
to x, is the ratio d^ : dx of the differentials. Hence dividing by the differential
dx, assuming that df/dy =£ 0, and rearranging gives the result
dy = -(8f/8x)
dx (df/dy) '
We state this as a corollary to Theorem 5-19.
Corollary 5T9 (a) If the real variables x and y are related implicitly by the
equation f(x, y) = 0, and the partial derivatives df/Bx and df/dy exist and
are continuous, then
- - ©/(£
dy
dx
whenever df/dy ^ 0. Insistence on this latter condition may be avoided by
writing the result in the alternative form
\8y! dx dx
\8y
The situation is slightly different if three variables x, y, z are involved and
z, say, is defined implicitly in terms of the independent variables x and y by
the equation
f(x,y,z) = 0.
In these circumstances it is frequently necessary to compute dzjdx and
Bzjdy from this implicit relationship. To do so, notice that an obvious
modification of Eqn (5-23) gives
but if z could be obtained explicitly, so that z = z(x, y), it would also follow
from Theorem 519 that
, 8z 8z .
dZ '- 8~x dX + Yy dy -
SEC 5-6 TOTAL DIFFERENTIATION / 233
Substitution of this result into the above expression gives
(df 8f 8z\ J IBf Bf 8z\ ,
and as x and y are independent variables, dx and dy are arbitrary so that this
expression can only be true if
8f 8fdz df 8f8z
8x 8z8x 8y 8z By
Hence, we find that provided 8f/8z ^ 0,
!=-(£)/© - %- (!)/(!)•
We state this in the form of a further corollary.
Corollary 5- 1 9 (b) If the real variables x, y, and z are related by the implicit
equation f(x, y, z) — and the first-order derivatives of / exist and are
continuous, then
when Bfjdz ^ 0.
Example 5 19
(a) Find d//dx given that x 2 y + sin xy = 0.
(b) Prove that (d/dx)(.xT) = rx r ~^ when r is rational.
(c) Find dzjBx and 8z\8y given that f(x, y, z) = x 2 + 2xyz + z 3 .
Solution (a) We must apply Corollary 5-19 (a). As, in this case,
fix, y) = x 2 y + sin xy
it follows that
8f
— = 2xy + y cos xy
and
x 2 + x cos xy
8 l = ,z
By
Hence, by Corollary 5-19 (a),
dy _ —j8f/8x) _ _ llxy + y cos xy\
dx~ (dfldy) ~ ~ [ x 2 + x cos xy )
234 / DIFFERENTIATION OF FUNCTIONS CH 5
whenever x 2 + x cos xy ^ 0.
(b) We have already shown in Theorem 5-2 that if y = x n , then dyjdx
— nx 11 ' 1 for n a positive or negative integer. Now we must show this result
is still true if the power involved is rational.
Let j = x r with r = pjq, where p and q are integers without any common
factor. Then j = x p,s implies, and is implied by, _y« = x p . Let f(x,y) = y Q — x p
so that our equation corresponds to f(x, y) = 0. Then there clearly exist
pairs of real numbers (x, y) for which yi = x?, and by Theorem 5-2, dfjdy
= qyi- 1 =£ when y =£ (that is, when x =£ 0), and both dfjdy and dfjdx
= — pxP~ l are continuous functions. Hence the conditions of Corollary
5-19 (a) are satisfied so that by the second form of its statement we may write
dy
nyQ-l J- - px v-i = 0.
dx
Thus
dv p xp- 1 p xp- 1 p , ,
dx q y^ 1 q O^ 7 *)?" 1 q
when x ^ 0. In the event that x = we have
— (XP'Q)
dx
, xVi -
= hm »
whenever this limit exists, which it does v/hen pjq > 1, and is then equal to
zero. This establishes our desired result for all x.
(c) Here,
f(x, y, z) = x 2 + 2xyz + z 3
and so
8f 8f „
f- = 2x + lyz, f- = 2xz,
dx dy
df
-f = 2xy + 3z2
dz
Thus by Corollary 5- 19(b),
dz = ,2x + 2yz\ and
dx \2xy + 3z 2 /
dz —2xz
dy 2xy + 3z 2
5-7 Envelopes
A simple and useful application of the total differential is to the problem of
the determination of envelopes already touched upon in Section 2-5. Before
proceeding with this application we now formally define an envelope.
definition 5-6 Let a family of curves T in the (x, j)-plane with parameter
a be defined by the implicit equation
SEC 5-7
ENVELOPES / 235
/Or, }', a) = 0.
Then the envelope of the family T, when it exists, is that curve £' which is
tangent to every member of the family.
Figure 5-16 (a) shows some representative members of the family V
corresponding to values <xi, <X2, a 3 , and an of the parameter a. Figure 5-16 (b)
shows the same situation on closely neighbouring curves Ci and C2 when the
parametric value for C2 is ao + doc which differs only by the differential da
from the parametric value ao appropriate to Ci. We shall assume that the
curves Ci and C2 intersect at the point P with coordinates (xo, yo).
Jlx,y,a,) =
A
y
It
*ifli§ili|ililliiiliti§>
/t\x.y,a +da) =
y»
^srfrftx- y- a o) ^ C,
1 k
x Q X
(a)
(b)
Fig. 5-16 Construction of envelope: (a) envelope of family of curves; (b) neigh-
bouring members of the family.
Setting u =f(x,y, a), and regarding x, y, and a as variables, it follows
from Theorem 519 that
8f , 8f 8f
ox oy Oct
and as the family is defined by setting u = (constant) it then follows, as in
Eqn (5-23), that
8f 8f 8f
ox 8y dor.
This equation which relates the differentials dx, dy, and da to the neigh-
bouring curves Ci and C2 is, in particular, true at P. We signify this by
writing
\dxlp \eyl p ■ \8xJ p
(5-24)
where (-) p denotes that the associated quantity is to be evaluated at P.
This equation is just the intersection condition for curves Ci and C2 at P.
As it is required of the envelope S' that it be tangent to every member of
236 / DIFFERENTIATION OF FUNCTIONS CH 5
the family T it follows that as da -> 0, so curve Ci must tend to C2 and the
gradient of the envelope «f at P must tend to the gradient of the tangent to
Ci at P. To compute this we use the fact that a = ao is constant for curve Ci
so that the argument that gave rise to Eqn (5-24), when applied to
fix, y, ao) = gives the tangency condition
-(£),"* + (i), d '- (525)
Now both Eqns (5-24) and (5-25) must be simultaneously true for «? and,
consequently, we arrive at the condition
[da.)
I) dK = °-
' v
which, since in general da is a non-zero differential, can only be true if
= 0. (5-26)
In addition to this result, the fact that P is a point on Ci implies that
f(xo, Jo, ao) = or, equivalently that
lf(x,y,aL)] p = 0. (5-27)
Both conditions (5-26) and (5-27) must be satisfied if the envelope $ is to
pass through P and be tangent to Ci at that point, so that dropping the suffix
P, we see that $ is the locus of all points for which
/(*,>>, a) = and —f(x,y,a) = 0. (5-28)
Elimination of a between these two equations gives a relationship between
x and y which is the desired equation of the envelope S. We have thus proved
the following result.
theorem 5-20 (envelopes) When it exists, the equation of the envelope
$ of the family of curves
f{x, y, a) =
with parameter a is determined by the elimination of a between the equations
fix, y, a) = and ^/( x > J- °0 = °-
Example 5-20 Determine the envelope $ of the family of curves
(x - a)2 + iy + a) 2 = 1,
SEC 5-7
ENVELOPES / 237
Fig. 517 Envelope of circles.
with parameter a.
Solution If we write the equation of this family of curves in the form
f(x,y,aL) = 0,
then we must set
f(x,y,H) = (x-*)* + (y + <*)*- 1.
Hence the equation 8f/8a. = corresponds to
-(x - a) + (y + a) =
or, equivalently, to
a = |(x — y).
To determine the envelope, the conditions of Theorem 5-20 require that
f(x, y, a) = simultaneously with dfjda. = 0. Hence substituting for the
parameter a arrived at above from the condition df/dx = into the family
of the curves f{x,y, a) = gives
238 / DIFFERENTIATION OF FUNCTIONS CH 5
1(* + J) 2 + l(x + y) 2 = 1
or,
x + y = ± y/2.
The desired envelope £ thus comprises the two straight lines
y = \J2 — x and _y = —-^2 — x.
This result could also have been deduced by geometrical arguments as
follows. The original family of curves comprise circles of unit radius, each
with its centre at x = a, y = — a. Consequently, the tangents to these
circles which form their envelope $ must be straight lines parallel to the line
of centres j = — x and separated from it by a unit distance (Fig. 5-17).
Although in this case it was possible to eliminate a from the equations
arising from Theorem 5-20, this situation is not generally possible. In the
next example we illustrate how on occasions a may be retained in a form
which allows the equation of the envelope to be expressed in parametric form.
Example 5-21 Find the envelope of the equation
a 2
(x - a) 2 + y 2 = t— — 2 >
1 + a d
where a is a parameter.
Solution We again write the equation in the form
fix, y, a) = 0,
where this time
tx 2
fix,y, a) = (x - a) 2 + y 2 - 2 -
Then
8f „ x 2a 2a3
J = -2(x - a) - — — +
3a 1 + a 2 (1 + a 2 ) 2
and hence the condition dfjda. = requires that
(x — a) =
(1 + a 2 ) 2 1 + a 2
Now this is a specially simple situation because y is absent from the equation
8J]8x = which allows us to solve immediately for x in terms of a to get
, r 2 + a 2 1
* = a3 L ( TT^# (A)
SEC 5-8 THE CHAIN RULE AND ITS CONSEQUENCES / 239
To find the envelope g, Theorem 5-20 requires that in addition to satisfy-
ing the condition Sf/da. = we must also require that f(x, y, a) = 0.
Using the form of (x — a) given above this is easily seen to be equivalent to
requiring that
2 (X 2
+ y 2 =
.(1 + a 2 ) 2 (1 + a 2 )J J 1 + a'
This may now be solved for y in terms of a to obtain
±a 2 (3 + 3a 2 + a 4 ) 1/2
y (1 + a 2 ) 2 ' ( '
The coordinates (x, y) of points on envelope g are thus determined in
terms of a by equations (A) and (B). Although it is not possible to eliminate
a between these equations to obtain an explicit representation for the envelope
$ in terms of x and y, this is of no real importance as we have obtained the
equations of £ in parametric form which are equally satisfactory. Different
values of a will determine different points (x(a), j>(°0) on tne envelope <f .
This example has in fact provided the detailed solution to the problem
first studied in Section 2-5. Notice that for large values of a we have x — > <x
and y—*- ±1, as was deduced from purely geometrical considerations when
the problem was first examined.
5-8 The chain rule and its consequences
If, in Theorem 5-19, the variables jci, x%, . . ., x n are specified in terms of a
parameter t, say, then the result requires slight modification. Suppose that
Xl = Xl(t), X2 = X2(t), . . ., X n — X n (t),
which are all differentiate functions of t. Then the variable u becomes a
function of the single real variable t for we may write
u = F(0, (5-29)
where F(t) =f(xi(t), x 2 (t), . . ., x n (t)).
Hence by an obvious adaptation of Eqn (5-11) defining differentials we
may write
d« = F'(t)dt, (5-30)
where, of course, F'(t) = duj&t the derivative of u with respect to t.
However by a further application of Eqn (5-11) to each of the variables
xi = xi(t), X2 = xz(t), . . ., x n = x n (t) we have the result
dx, -(£)*.*.-(£)„ «*-(£)*. ( ,3„
240 / DIFFERENTIATION OF FUNCTIONS CH 5
Substituting these expressions for the differentials dx< in terms of the
differential dt into the statement of Theorem 5- 19 gives
/ 8f dxi 8f dx 2 8f dx n \ ,
d«=M + — - + • • • + — -\dt. (5-32)
\Bxi dt 8x2 dt 8x n dt ) K '
Finally, a comparison of Eqns (5-30) and (5-32) shows that
8xi dt 8x2 dt 8x„ dt
As F'(t) = dujdt, this result facilitates the calculation of dujdt without the
need for formal substitution into u=f(xi, X2, . . ., x n ) of the values
Xl = Xi(t), X2 = X 2 (t), . . .,X n = X n (t).
We have proved the following useful result.
theorem 5-21 (chain rule for partial derivatives) Let u = f(xi, X2 x n )
be a real valued function of n real variables and let its first-order partial
derivatives exist and be continuous. Further, let each of the variables x\, x%,
. . ., x n be a differentiable function of the single real variable t so that we
may write
XI = Xi(t), X 2 = X2(t), . ■ ., Xn = X„(t).
Then the total derivative of u with respect to t is given by
d« 8f dx\ 8f d^2 8f dx n
dt 8x\ dt 8^2 dt 8x n dt
Two special cases of this theorem are of sufficient importance to merit
recording as corollaries. The first arises when / is a function of only two
variables between which an explicit relationship exists, and the parameter t is
identified with one of these variables.
As only two variables are involved we shall avoid the use of numerical
suffixes by agreeing to write x\ = x and X2 = y where, by supposition,
y = y{x) is some known explicit relation. The statement of Theorem 5-21
then becomes
d« 8f dx 8f dy
dt ~ ~8xdl 8ydt'
If, now, we identify t with x, then t = x and dx/dt = 1, dy/dt = dy/dx so
that the above result becomes
d« = e/; + a/;d7
dx 8x 8y dx
The expression on the right-hand side is the total derivative of u with respect
to x. The first term on the right takes account of the change directly due to x
SEC 5-8 THE CHAIN RULE AND ITS CONSEQUENCES / 241
whilst the second term takes account of the fact that y is itself a function of x.
This result enables dw/dx to be obtained without needing to substitute
y = y( x ) in the relation u = f(x,y).
Corollary 5-21 (a) If u=f(x,y) is a real valued function of the real
variables x and y with continuous first-order derivatives and y is related to
x by the explicit equation y = y(x), then
dw = S/; + fd£
dx 8x dy dx
More generally, suppose that u = f(x, y) whilst x and y are related
implicitly by the equation
g(x,y) = 0.
How must we modify our previous argument in order that we may compute
the total derivative dw/d.v? The result of Corollary 5-20 (a) is still true but
obviously dyjdx now depends on the form of g. To find the form of dy/dx we
can use Corollary 5-19 (a), writing/ = g, to see that
*y = _ ( d A\ l( 8 i\
dx \8x}/ \8y}'
showing that
du = 8f_/8f\/8g\//8g\
dx dx \8y)\8x)l \8yf
provided 8g\dy ^ 0. We state this as our next result.
Corollary 5 -2 1 (b) If u = f(x, y) is a real valued function of the real variables
x and y with continuous first-order derivatives, and y is related implicitly to
x by the equation g(x,y) = 0, then
^ = 8 l-( d l\( d A\l( d J\
dx dx \8yj\8x)i \8y)'
provided 8g\8y =fi 0.
Example 5-22 Determine the derivative du/dt given that
u = sin (x 2 + j 2 ) with x = 3t, y = 1/(1 + t 2 ).
Solution We must apply Theorem 5-21 making the identifications xi = x,
X2 = y, and/(x, y) — sin (x 2 + y 2 ) with x = 3t and y = 1/(1 + t 2 ). Hence
242 / DIFFERENTIATION OF FUNCTIONS
CH 5
— = 2.V COS (x 2 + v 2 ) -L
8x K y ' dy
= 2y cos (x 2 + j 2 )
whilst
dx dy __ —It
d7 ~~ ' d7 ~ (1 + ? 2 ) 2 '
Substituting in Theorem 5-21,
du
— = 2x cos (x 2 + y 2 ) . (3) + 2/ cos (x 2 + J 2 )
-2r
.(1 + / 2 ) 2 J
or
du
d7
= 2 cos(x 2 + J 2 )
3x
2^r
(1 + Z 2 ) 2 .
Using the known relationships between x, y, and t, the derivative' dujdt can
thus be computed for any desired value of t. The details are left to the reader.
Example 5-23 Determine the total derivative dujdx in each case :
(a) u = x cos y + y cos x when y = 1 + x + x 3 ;
(b) u = x 2 + 2xy — j 2 when x 2 + y 2 + cos xy = 0.
Solution (a) This requires an application of Corollary 5-21 (a). We set
f(x, y) = x cos y + y cos x and y = 1 + x + x 3
so that
8x
and
dy
dx
= cos y — y sin x,
= 1 + 3x 2 .
8f
— = — xsiny + cos x
dy
Hence, substituting into Corollary 5-21 (a),
du
dx
= cos y — y sin x + (cos x — x sin y)(l + 3x 2 ).
(b) In this case we use Corollary 5-21 (b), with
/(x, y) — x 2 + 2xy — y 2 and g(x, y) = x 2 + y 2 + cos xy.
Hence
8 1
dx
= 2x + 2y,
8f
dy
= 2x — 2j,
SEC 5-9
CHANGE OF VARIABLE / 243
8e 8g
— = 2x — v sin xy — -
dx y ' 8y
= 2y — x sin xy.
Finally, applying Corollary 5-21 (b),
dx
= 2(x+y)-
2(x — y)(2x — y sin xy)
(2y — x sin xy)
5 - 9 Change of variable
This section discusses a somewhat more complicated situation than that
covered by Theorem 5-21, namely, the implications on partial differentiation
of changing the independent variables in a function u =f(xi, X2, . ■ ., x n )
that is to be differentiated. This situation commonly occurs as a result of
changing coordinate systems to suit physical problems as the following
example illustrates. Suppose that p = p(x, y, z) is the pressure in a fluid
flowing parallel to the z-axis. Then dpjdz is the pressure, gradient along-the
direction of flow and Bpjdx, dpjdy are the transverse pressure gradients in the
plane z = constant.
»
Fig. 5-18 Cylindrical polar coordinates.
Now, if the flow takes place in a rectangular duct with sides described by
x = constant, y = constant, then the Cartesian coordinates 0{x, y, z} are
obviously the natural ones to use. However, if the flow takes place in a
cylindrical pipe, then the z-axis is still convenient as it can be aligned with the
axis of the pipe, but the x-, j-axes are now less useful since the wall of the
pipe becomes the curve x 2 + y 2 = constant. Clearly, a more sensible coordi-
244 / DIFFERENTIATION OF FUNCTIONS CH 5
nate system would be the cylindrical polar coordinates r, 6, z' in which r and
define a point in the plane z' = constant. Figure 5-18 illustrates this idea.
Plane z = z' = in both the 0{x, y, z) and 0{r, 6, z'} systems of axes, and
is denoted by IT. Relative to these two systems the point P has the coordi-
nates 0{x, y, z} and 0{r, 6, z'}, respectively, where
x = r cos 6, y = r sin 6, z = z'. (5-33)
How can the pressure gradients described by the partial derivatives
dp/dr, dpjdd, and dpjdz' be determined from Eqn (5-33), and the known
functions dpjdx, dpjdy, and dpjdz. The rest of this section is devoted to solving
this type of problem. Notice that from the definition of partial differentiation,
dpjdz and dpjdz' have essentially the same meaning, whereas dpjdr is the
derivative of p computed along a radius with 6 and z' held constant, whilst
dp/dd is the derivative of p tangential to a circle r = constant drawn on the
plane z' = constant.
Although the replacement of coordinate variables in this manner involves
replacing a set of n independent variables by a new set also comprising n in
number (n = 3 above), we shall first prove a more general result. Specifically,
consider the implication of the situation in which
u = f(xi, x 2 , . . ., x n ), (5-34)
when the independent variables x\, xi, . . ., x n are themselves differentiable
functions of another set of variables which we denote by oci, 1x2, . . -, «m-
It is not necessary that m should equal n. Thus we have
Xl = Xi(<Xl, 0C2, . . ., oc TO ),
X2 — *2(<Xl, 1X2, • • •> «m), f5-35 , \
Xn = X n (tX-l, <X2, . . ., <*m),
If the variables xi in Eqn (5-34) were to be replaced by the equivalent functions
(5-35) involving the variables an, then / would become some function
F(xi, (X2, . . ., ocm) of ai, 0C2, . . ., oL m so that by Theorem 5- 19 we could write
8F dF dF
d« = — dai + — da 2 + • • • + — dam. (5-36)
oai 0OC2 cam
CHANGE OF VARIABLE / 245
Next, observe that by applying this same theorem to the equation for
x t in Eqn (5-35) we obtain
a dx * j 8x i , Sx t
dx^-da^-da, + •••+_ do,, (5-37)
for i=l,2,. . ., n.
Substituting these expressions into the statement of Theorem 5- 19 then
gives
H„_ 8 f \ 8x ^ A . 8x i, , 8 Xl I
+ ^fc dai + ^ da2 + --- + a^H- (5-38)
On re-arrangement this becomes
d „=r^!^+^^! + ... , ¥^„i ■
L&a a«i ^ dxz aai + + a^^Tj dai + * * •
da m . (5-39)
Since /(* lf * 2 , . . ., Xn ) = F ^ u X2 ^ it fo j lows fe a direct
comparison of the fth terms of Eqns (5-36) and (5-39) that
j? = 8/^8x1 ,£f_Sxs 8/ 8x«
8x t dxi Son "*" 8x2, Son + ' ' ' + Bx n sTt ( 540 >
for i = 1, 2, . . ., w.
We state this result in the form of a general theorem.
theorem 5-22 (change of variable) Let/(*x, *,, . . ., Xn) be a real valued
function of the real variables x u x 2 Xn whose first-order derivatives exist
andare continuous. Further, let*! = xfa, « 8> . . ., ^ Xz = ^^ ^ . ; ^
. . .,x n - x„(ai, « 2 , . . ., a. m ) be differentiable functions of the real variables
<*i, a2, . . ., a m , then
3ai 8x! Sai 8x2 8ai ' Sx„ Sax
246 / DIFFERENTIATION OF FUNCTIONS CH 5
8/ _ 8f 8xi 8f 8x2 8f_ &*»
8(X2 8X1 8<X2 8x2 80L2 8x n 8x2
8f _ 8^8xi 8f_8x2_ 8f 8x n
8cn, m 8xi dx m 8x2 8a.m 8x n 8xm
Example 5-24 Express df/dr, 8fj8d, and 8fj8z' in terms of df/dx, 8fj8y, and
8fj8z given that x = r cos 6, y — r sin 6, z = z'. Find their values given that
/(x, y, z) = x 2 + 3xy + y z + z 2 .
Solution We must apply Theorem 5-22 with m = n = 3 by making the
identifications xi = x, x% = y, xz = z and oci = r, «2 = 6, 0C3 = z'. Our
first result is
8r 8x 8r 8y 8r 8z 8r
8f _ 8f 8x 8f 8y 8f 8z
86 ~ '8x ~86 + '~8y lid* "SzW
8f _ 8f 8x 8f 8y 8f 8z
8? _ 8x ~8z' + ~8y 8? + ~8z ~8z~''
However,
8x 8x . 8x 8y .
— = cos 6, —- -r sin 6, — = 0, — = sin d,
8r 80 8z 8r
8y 8y 8x 8y „ 8z'
■4 = r cos 0, — , = 0, —=■/• = 0, — = 1.
86 8z' 8z' 8z' 8z
Hence, substituting these values into the above transformation equations
shows that
8f 8f n 8f . n
8f 8f . n 8f
-4 = - — r sin 6 + -x- r cos 6,
86 8x 8y
8z' 8z
Next, using the fact that/(x, y, z) = x 2 + 3xy + y 2 + z 2 we see that
SEC 5-9
CHANGE OF VARIABLE / 247
! = 2* + 3 7)
8f
f = 3x + 2y,
8y
oz
so that
-^ = (2x + 3y) cos 6 + (3x + 2y) sin 0.
However, as r 2 = jc 2 + y 2 and cos 6 = x/(x 2 + j 2 ) 1/2 , sin 6 = y/(x 2 + J 2 ) 1 ' 2 ,
this result simplifies to
8f 2x 2 + 6xy + 2j; 2
g;- ( X 2 + ^2)1/2
A similar calculation shows that
| = 3(, 2 -^ 2 ), | = 2,
Consider the special case of Theorem 5-22 that results when m = n = 2,
so that its statement becomes
8f _j8f 8xi 8f 8x2
8xi 8xi 8xi 8x2 8oli
8f _ 8f 8xi 8f dx z
8a.2 8xi 8x2 8x2 8x2
(5-41)
Now for any differentiable function/(*i, X2), once the variable change has
been decided, these equations express the partial derivatives / Kl ,/ aa in terms
offx ,f x which we suppose to be known. However, if 8f/8xi and 8fj8x2 are
supposed known, then Eqns (5-41) can be regarded as simultaneous equations
for/i ,f x% . Thus, provided the simultaneous equations can be solved, Eqns
(5-41) may be regarded as describing a one-one transformation, or mapping,
between partial derivatives of/ with respect to (xi, X2) and (ai, 0C2).
It is easily seen that provided J(xi, X2) ¥^ 0, we have
8f_
8x1
(
8f 8x2
8x1 8x2
8f 8x2
8x2 8x1
3(xi, x 2 ),
8x2 \8xi 8x2 8x2 8x1/1
(5-42)
where
8x\ 8x2 8x1 8x2 _
J(Xl, X2) = — — - ^— -
8x1 8x2
8x1 dxi
8x1 8x2
8x2 8x2
(5-43)
248 / DIFFERENTIATION OF FUNCTIONS
CH 5
The expression J(xi, xi) is the Jacobian of the transformation and is
usually written in the form of the functional determinant shown in Eqn
(5-43). If the Jacobian vanishes at any point in the (on, a2)-space then at such
points the transformation we are discussing obviously becomes invalid and is
singular. This is because at such points there is no longer any relationship
between partial derivatives in the two coordinate systems. In more advanced
discussions, the Jacobian is shown to play a fundamental role in all matters
relating to changes of variable. Sometimes, to emphasize the variables in-
volved, in place of }(xi, X2) the alternative notation 8(xi, x 2 )/S(ai, <x 2 ) is
used. This idea is readily extended to more than two independent variables as
would be appropriate in Example 5-24, where three variables are involved.
The non-vanishing of the Jacobian is thus seen to provide an essential
condition for the partial derivatives of any differentiable function /, with
respect to (xi, x%) and (ai, 0C2), to be interchangeable by virtue of Eqns (5-42).
Example 5-25 Find the Jacobians of the following transformations and
state where, if at all, they vanish :
(a) x = r cos d, y = r sin 6 (polar coordinates);
(b) x — u + v, y = u — v;
(c) x = 3m 2 + v 2 , y = u + v.
Solution
(a) 5(x,y) =
d(x, y)
8(r, 6)
cos a
—/•sin 1
sin
/-cos
= /-(cos 2 + sin 2 0) = r.
Hence in the case of polar coordinates the Jacobian vanishes at r =
(that is, at the origin) which is the only singular point of the transformation.
(b) J(x,j) =
Kx,y)
8(u, v)
1 1
1 -1
= -2.
This Jacobian never vanishes so that the transformation is always permissible.
(c) J(x, 7 ) =
Kx, y)
8(u, v)
6u
2v
= 6w — 2v.
The Jacobian vanishes when 1u = v, so that the transformation is invalid, or
singular, at all points on that line in the (u, t>)-plane.
5-10 Implicit functions
We have already used implicit functions when discussing various consequences
of total differentials, and will now examine these ideas more closely. Consider
the equation f(x, y) — 0. Often the argument is used that from this implicit
function of x and y we can, in principle, solve for y, and as y depends on x,
we are entitled to express y in the explicit form y = <p(x).
SEC 5-10
IMPLICIT FUNCTIONS / 249
Suppose that f(x, y) = x 2 + y 2 + 1 . Then no real values of x and y
satisfy the implicit equation f(x, y) = 0, so certainly in this case one cannot
solve for y. Thus a necessary condition that we may solve for y near to some
point P with coordinates (xo, yo) is that there are real numbers xo, yo such
that /(xo, jo) = 0.
Now let u =f(x,y) be the graph of f(x,y), and assume that /a; and f y
exist and are continuous so that the graph will be a smooth surface of the
type shown in Fig. 5-19. Then/(x, y) = is the curve of the section of this
surface by the plane u = 0. In general the curve of the section will be similar
to the smooth curve L shown in the figure and can be described by an equa-
tion of the form y = <p(x). This will obviously be the case provided firstly,
that the surface u = f(x, y) and the plane u = intersect and secondly, that
they are nowhere tangential. The curve L will be smooth, and the function
cp(x) differentiable, because the assumed continuity of the derivatives f x and
f y will ensure that the surface u = f(x, y) is itself smooth, and so will generate
a smooth curve of section. This is, of course, the assertion made in Corollary
5-19 (a). Let P be a representative point on L with coordinates (xo, jo) in the
u = plane, and line / be drawn tangential to the surface u = f(x, y) at P
in the plane x = xo. Then by Definition 5-5, the angle a between line / and
the plane u = is such that tan <x = Sf/ 8y\ (xom) .
Fig. 5- 19 The function y
plane u = 0.
<i>(x) defined by the intersection of u = f(x, y) and the
250 / DIFFERENTIATION OF FUNCTIONS CH 5
Hence the condition that the surface u =f(x,y) and the plane u =
should not be tangential at P is seen to be f y (x , yo) ¥=0. Collecting our
results we now formulate them as the following theorem.
theorem 5-23 (implicit function theorem) Let/(x, y) be differentiable and
have continuous first-order partial derivatives near to (xo.jo) at which
f(xo, yo) = andf y (xo, yo) i z 0. Then, near (xo, yo), it is possible to solve the
implicit equation f(x, y) = uniquely for y in the explicit form y = y(x),
where y(x) is differentiable. That is, near to (xo, yo),f(x, (p(x)) = 0.
Notice that this theorem is only of the existence type in that it ensures
that an explicit representation y = <p(x) exists, but gives no information on
how such a representation may be found in any specific case.
As a corollary to this theorem, consider the relationship between the
derivatives of a function and its inverse. Let F(x, y) = y — f(x), so that
F(x, y) = implies the relationship y =f(x). Suppose that at some point
(*o, yo) we have f'(x ) ^ and y = f(x ). Then, noticing that dFjdx
= (8l8x)[-f(x)] = (dldx)[-f(x)) = -f'(x) and dF/8y = 1, it follows from
Theorem 5-23 that close to (xo,yo) we may solve for x as a function of y to
obtain an inverse function x = (p(y). That is, F(<piy), y) = y — f[f(y)] = CL
Furthermore, applying Corollary 5- 19(a) to F(x, y) = and regarding
y as the independent variable and x as the dependent variable, we have
so that provided /'(Jc) ¥= 0, we have
^=l//'(*) or cp'(y) =l/f'(x),
which is the desired result.
Corollary 5-23 Let y =f(x) be a real valued differentiable function of x
close to some point (xo, yo) at which yo =f(xo). Let x = y(y) be the function
inverse to it close to the same point (xo, yo) so that xo = <p(yo), and let
/'(*o) ^ 0. Then close to (xo, yo), we have
<p'(y) = !//'(*)
or, equivalently,
41 -/(£)•
dy
This corollary has two important applications which we mention next. The
SEC 5-10 IMPLICIT FUNCTIONS / 251
first application of Corollary 5-23 is to the differentiation of inverse circular
functions. In Section 2-2, we agreed to write
y = arc sin x when x = siny and — w/2 < y < tt/2.
Now,
d
— (sin j;) = cos j> j= for — w/2 < J < W2;
that is, for — 1 < x < 1 and so, by Corollary 5-23,
^ = 1 l(te\ =J-= l = 1
d* / \dyj cosy V0 - sin 2 y) V0 - * 2 )
The positive square root has been taken here because the principal branch of
the function y = arc sin x is a monotonic increasing function of x in its
domain of definition — 1 < x < 1 . By this same argument, the negative
square root is taken when differentiating the principal branch of the function
y = arc cos x which is a monotonic decreasing function of x in its domain
of definition — 1 < x < 1 . Thus
d 1
— (arc sin x) = —- for — 1 < x < 1.
ax v (1 — x 2 )
Similar arguments establish Table 5-2. In the entries for the derivatives
of arc cosec and arc sec, the term \x\ has been introduced to take account of
the two separate cases that need consideration when deriving these results;
namely, when x > a and when x < —a. These same ideas will be encountered
again in the next chapter in connection with Table 6-3, when they will be
discussed in more detail.
Table 5-2 Derivatives of inverse circular functions
- (arc sin xja) = — — (arc cos xja) = ■
dx V(a 2 - x 2 ) dx K vV - x*)
for —a < x < a for — a < x < a
d a d — a
■ (arc tan xja) = —— — - — (arc cosec xja) = ■
dx" a* + x 2 dx y 1*1 V(* 2 - a 2 )
for all x for | x \ > a
d , , . a d , , , —a
- (arc sec xja) = - — ■ — — — — (arc cot xja) ■■
dx \x\ V(* 2 - a 2 ) dx ' ' a 2 + x*
for | x | > a for all x
In Chapter 2 we saw that curves may be described parametrically thus:
252 / DIFFERENTIATION OF FUNCTIONS CH 5
x = X(t), y = Y(t), (5-44)
where t is a parameter defined in some interval J. The question that now
arises is how may we find dy/dx in terms of the functions X(t) and Y(t ).
Let us suppose that X(t) and Y(t) are differentiable functions of t with
continuous derivatives and that X'(t) ^ 0. Then by Theorem 5-23, we may
solve x = X(t) in the form t =f(x), say, so that then y = Y[f(x)]. From
Theorem 5-7 on the differentiation of composite functions we have
dy d dYdf
or, equivalently,
dy dy dt
dx = dt"d~x ( 5 ' 45)
However, by Corollary 5-23, d//dx = l/(dx/df) so that
dy dy /dx
dx dt/dt (5 ' 46)
Hence, like x and y, the derivative dy/dx is now also known parametrically
in terms of t.
This result is best remembered in symbolic operator form :
d i d
dx ~ (dxjdt) d7 (5-47)
Higher order derivatives with respect to x may be found either by a
repetition of the argument leading to Eqn (5-46), or by successive applications
of Eqn (5-47).
Thus, using Eqn (5-47), we have
d2y = _d_ /dy\ = 1 fd /dy ldxY\
dx 2 dx \dx) ~ (dx/dt) Idt [dt I dt /J
or, denoting differentiation with respect to t by a dot,
d2y _ d_ /dj\ _ld /dy\
dx 2 dx \dx) x dt \dx/
Using the fact that dy/dx = y\x and performing the indicated differentia-
tions gives
d 2 j xy - xy
d7 2 - —JT- (5 ' 48)
It is recommended that the reader remembers the arguments leading to the
operator rule (5-47) together with the rule itself, rather than remembering
SEC 5-11
HIGHER ORDER PARTIAL DERIVATIVES / 253
results of the form (5-48).
Example 5-26 If x = t + 2 sin t, y = cos t determfne dyjdx and d 2 j/dx 2
and hence deduce their values when t = 0.
Solution We have
dx
dy
. =1+2 cos t, — = —sin t,
dt dt
so that by Eqn (5-46)
dy _ dy jdx _ —sin t
dx ~ dt I dt ~ 1 + 2 cos i
When f = we have x = 0, y = 1 and
dy
d^ *=()
Next, as
—sin t
1 + 2 cos f
= 0.
«=o
d*y
dx 2
(djc/d/) df \dx/
we have
1
d 2 );
dx 2 ~ 1 + 2 cos t dt
—sin
.1+2
in t "1
1 cos fj
Thus, performing the differentiation and simplifying,
2 + cos ?
<Py
dx 2
and so
d 2 ^
dx 2
z =
.(1+2 cos 3 .
' 2 + cos t
.(1 + 2 cos r) 3 .
«=0
5-11 Higher order partial derivatives
If the function f(x, y) is differentiate with continuous first-order derivatives
fx and f y , then it can also happen that these partial derivatives which are
functions of x and y are themselves differentiable. Thus we are led to consider
the further partial derivatives
Tx^Jy^k^^Yy^-
These functions, when they exist, are second-order partial derivatives of/
254 / DIFFERENTIATION OF FUNCTIONS CH 5
and are respectively denoted by
8J 8 2 f 8J d*f
8x? 8y8x 8x8~y &nd 8f
Using an alternative notation we often write these same derivatives as
fxx, fxy, fyx, and fyy.
In this notation the first suffix signifies the partial derivative of/ that is to be
differentiated partially with respect to the second suffix. The centre pair of
derivatives are mixed second-order partial derivatives and it is conventional
that the order of x and y in corresponding mixed derivatives in the two
notations is interchanged. Thus we have,
8 B 2 f 3 d 2 f
Ty^ = Wx=^ bUt 8-xM = dy
It is important to notice that the double operations of partial differentia-
tion that lead to the mixed derivatives f xy and f yx are performed in different
orders. Consequently we have no right to expect that the derivatives that
result will be equal to one another. To emphasize this point we now write out
in full the limiting operations involved in arriving a.tf X y and^:
8
fxv(.x , yo) = y [fx(x, y)]
= lim -
lim
4—0 K Ul-*0
(Zo.Vo)
f(x + h,y + k) -f(x ,y + k)
— lim
A-K)
h
/(*o + h,y ) - f(x , jo)l
and so, writing
g(xo,yo, K k) =f(x + h,y + k) —f(x ,yo + k) -f(x + h,y )
+ f(xo,yo),
we obtain the result
fxy(x , yo) = lim lim — g(x , y , h, k), (5-49)
k— 0/s-M) UK
where the inner limit with respect to h is to be taken first. Exactly similar
reasoning gives the corresponding result
fvx(xo, yo) = lim lim — g(x , yo, h, k). (5-50)
A-mt-m UK
Here it is the inner limit with respect to k that is to be taken first.
The double limits used in Eqns (5-49) and (5-50) are called iterated limits
SEC 5 1 1 HIGHER ORDER PARTIAL DERIVATIVES / 255
on account of the fact that they are taken sequentially so that their order is
important. They are not to be confused with the simple double limit of
Definition 3-8 into which questions of order do not enter.
Let us now explore the consequence of requiring one of the mixed
derivatives, sayf xy , to be continuous. This is, of course, the usual situation.
Definitions 3-8 and 3-9 imply that if fxy is continuous at (xo, yo), then a limit
L = fxy(xo, yo) exists with the property that
L — \\mf xy (xo + h,y + k), (5-51)
A—0
ft->0
where the question of the order in which the limits are to be taken does not
occur. Hence, &sf xy (xo,yo) is also defined by Eqn (5-49) in which an iterated
limit is involved, the equating of these two results implies that if f xy is con-
tinuous, then the order of the iterated limits in Eqn (5-49) is immaterial. Thus,
under the stated conditions, expressions (5-49) and (5-50) become identical
and the continuity of f xy implies not only the existence of f yx , but also that
fxy = fy X . This establishes our next result.
theorem 5-24 (equality of mixed derivatives) Let f(x,y) be a real valued
function of the real variables x, y, and \etf x ,f y ,f xy exist and be continuous
in the neighbourhood of the point (xo, yo)- Thenf yx also exists at (xo, yo) and
8J
8xdy
takvo) 8 y 8x
(xo.vo)
Still higher-order derivatives can be defined by an obvious extension of
the notation. Thus, for a suitably differentiable function /we may define the
third-order partial derivatives
Jxx X , Jyyx, Jxyx, fyyy, etc.
If the higher-order derivatives involved are continuous then, by an
obvious extension of Theorem 5-24, the order of performing differentiations
may be disregarded. In the case of the mixed third-order partial derivative
f xyx this would imply that
f -L
Jxvx ~ 8x
" 3
?y {fx \
-(/»]-/*
8
8~y
Hence, under these conditions, it is proper to extend the 8 notation by
writing
ay sy ey ey
8x 3 8x8y 2 8x 2 8y 8y 3 '
Example 5-27 If f(x, y) = x 4 + 2x 2 y 2 + xy* find the second- and third-
order partial derivatives of/.
256 / DIFFERENTIATION OF FUNCTIONS CH 5
Solution First-order derivatives :
f x = 4x 3 + 4xy 2 + y\ f y = 4x 2 y + 4xy 3 .
Second-order derivatives :
/„ = 12x 2 + 4y\ f yy = 4x 2 + \2xy\
f*« = ■?-(/*) = Sxy + 4y3.
This mixed derivative is continuous, and sof xy = f yx . As a check in this case
we compute f yx directly:
8
fyx = — (fy) = ixy + 4y 3 .
Third-order derivatives :
8
fxXX = 24*, fyyy = 24xy, f X yy = — (f x y) = 8* + 12j 2 ,
8y
8
fxxy = g~ (fxx) = 8j.
The continuity of the third-order derivatives we have computed ensures the
existence and equality of the other corresponding third-order derivatives that
may be defined. Thus, for example, as f XX y = 8y is continuous, there is no
need to compute f xyx> since it exists and is equal tof xxy .
Example 5-28 Define the function /by the requirement
I xyix - y ) e . the] _ ^ ^
f(x,y)=l * 2 +J 2 ^ y
[0 if both x = and y =
Deduce the value of each of the mixed derivatives at the origin.
Solution We shall use definitions (5-49) and (5-50) for this purpose by setting
xo = 0, jo = so that
« « , ,, hk ( h2 ~ k2 )
Then, from Eqn (5-49),
/^o,o>-, imI , m ±pJgi^5)
1--0 h-^o hk\ h 2 + k 2 j
PROBLEMS 257
h*-k 2 (-k*\
= hm lim — — = lim -— - = — 1.
t-M, A-.0 h* + k 2 *^o \ &* /
However, because the order of the iterated limits are reversed in Eqn (5-50),
the same argument also shows that
. ,. A 2 -it 2 ,. (h 2 \ .
M°> 0) = hm hm = hm - = 1.
Thusf xy (0, 0) = — 1 whereas /^(O, 0) = 1. This occurs because the functions
fxy andfyx are not continuous at (0, 0) as may be checked by direct calculation.
PROBLEMS
Section 5-1
5-1 Give examples of four physical quantities that are essentially defined in
terms of a derivative.
5-2 Use Definitions 5-1 and 5-2 to prove that the following functions are differ-
entiable in the stated intervals and to compute their derivatives. Evaluate
these derivatives for the stated values:
(a) /(x) = 3x* in [o, 3], find/'(2);
(b) /(x) - 2x» + x + 1 in [-1, 4]) find/'(3);
(c)/(x)=|x|in(0,co),nnd/'(l);
(d) /(x) = | x | in (- oo, 0), fihd/'(-3);
(e)/(x)=l/;cin[l,5],nnd/'(4);
(f) f(x) = X 1 ' 4 in (0, oo), find/'(2).
5-3 Prove that/0) = I x | is not differentiable at the origin.
5-4 Consider the graph of f(x) = x 3 + x + 1. Let xi and x 2 be two points on
the x-axis with the property that the gradient dy/dx of the curve y = /(x) at
x = X2 is four times the gradient at x = xi. Derive the algebraic equation
connecting xi and X2 and deduce that | xi \ > 1.
5-5 Deduce the gradients of the functions /(x) to the immediate left and right of
x = 1 given that:
fx 3 + x + 1 for x > 1
(a)/W= 5-x-x2forx<l;
{x 3 — x + 3 for x > 1
2x + l
forx< 1.
5-6 Prove that the function / defined by /(x) = x 2 sin (1/x) for x =£ and
/(0) = is differentiable at the origin and find the value of its derivative
there.
5-7 Prove from first principles that d/dx(cos ax) = — a sin ax.
5-8 At which points in the stated intervals, if any, are the following functions
/(x) non-differentiable:
(a) fix) = x + sin 2x for < x < *;
258 / DIFFERENTIATION OF FUNCTIONS CH 5
,^ rr s l X + !/* f0r X ^ °1 •
(b) fix) = m the interval [-1,1];
(0 tor x = 0)
, •> „ ^ (1 for x rational 1
(c) /(x) = . . in the interval [0, 1].
10 for x irrationalj
5-9 The function /(>:) is defined on the interval < x < 1 by the expression
{sin 2x for < x < Jn-
ax + b for \v < x <> 1.
Deduce the values of a and b in order that the function should be continuous
and have a continuous derivative at x = Jw. Interpret these conditions
geometrically.
5-10 Give an example of a continuous function / defined on the interval [0, 5],
that is differentiable everywhere except at x = 1 at which point the left-hand
derivative is 3 and the right-hand derivative is 5. That is to say, the tangent
line to the graph drawn to the left of x = 1 has gradient 3 whilst the tangent
line to the graph drawn to the right of x = 1 has gradient 5.
Section 5-2
5-11 By assuming Theorem 5-2 is also valid for rational n where necessary, find
the derivatives of the following functions, stating at which points in their
domains of definition, if any, they are non-differentiable:
t \ rt \ i xX ' 3 + cos 3x > for * * °\ ■ u ■ i ,
(a)/(x)= } in the interval —$* <, x < v ;
10, for x = 0J
(b) f{x) = x sin 2x + x 5 ' 3 for -1 < x < 3;
(c) f(x) = | cos x | for <, x <. it.
5-12 Use Theorem 5-4 to give an inductive proof that, if ki, k% k n are con-
stants and/i(jc),/ 2 (x), . . .,fn(x) are differentiable functions in the interval
a <, x < b, then
d * "
-5- 2 ktftix) = J kifi'ix) ina<,x<b.
ax i=l i = l
5-13 Dififerentiate the following functions:
(a) y — x 1 * 3 sin x;
(b) y = (x 2 + 3x + 1)(1 + cos 2x);
(c) j = sin 6x cos 2x;
(d) y = (x 3 + 2x - 1) eos 3x.
5-14 Differentiate the following functions by making a repeated application of
Theorem 5-5:
(a) y = (1 + x 2 ) sin 7x cos 4x;
(b) y = (1 + 2x2 + X 4)S.
(c) _v = cos 3 2x;
(d) y == (1 + x 3 )2 sin 2 3x.
5-15 Differentiate these composite functions:
(a) y = (x 2 + 2x + 1) 3/2 ;
PROBLEMS / 259
(b) y = (a + bx 3 ) 1 ' 3 ;
(c) j = (2+ 3sin2x) 5 ;
(d) y = sin (1 + 2x 3 );
(e) y = sin [sin (1 + x 2 )];
(f) y = cos (1 + x 4 ) 1 ' 2 .
516 Differentiate these quotients:
(a) y = (x 2 + 3x+ 7)/(x 4 + 1);
sin (1 + x 2 )
(b)y =
(c)y =
(d) y =
(e)j- =
x 4 + 2x 2 + 6'
1 1_ .
3 cos 3 x cos x'
tan(l - x 2 + x 4 )
sin (x 2 + 1) '
1 + Vx
1 - V*
5-17 Differentiate these functions:
1
(1 -3cosx) 2 '
x
tan (1 + x 2 + x 4 )
(b)y=*
(C) y = ^~ sin (1 + x 2 )
(d) j = cosec 2 (l + 3x);
sin x + 2 cos x
(e) y = ^ 5
17 sin x — 2 cos x
<S)y =
/3x - 1\
5-18 If the functions /i(x),/2(x),^i(x), and^Cx) are differentiable, show by direct
expansion that this theorem is true:
_d_
dx
/i(x) / 2 (x)
giix) g£x)
£i(x) gz(x)
+
Apply this result to differentiate the determinants:
(a)
x 2 x sin x
cosx 1
fl>)
(1 + x 2 cos x) (2 - sin 2 x)
(1 - x 2 cos x) (2 + sin 2 x)
5-19 Suppose that the functions /«(*)> with i,j = 1, 2, or 3, are differentiable func-
tions of x. Prove, by means of Problem 5-18, that
dx
/ii(x) /i 2 (x) fis(x)
/2l(x) /22(X) /23(X)
/3l(x) /32(X) /33(x)
fn'ix) fW{x) fw'ix)
/2l(x) /22(X) /23(X)
/3l(x) /32(X) / 33 (X)
260 / DIFFERENTIATION OF FUNCTIONS CH 5
+
/uW /12O) fis(.x)
/21'W ft&\x) fis'ix)
f3l(x) foAx) foi(x)
+
fn(x) /12W fia(x)
f2i(x) f 2 z(x) f 23 (x)
fai'ix) fn'(x) f 33 '(x)
Section 5-3
5-20 Use the intermediate value theorem to prove that if f(x) is continuous on
[a, b], with f(a) and f(b) having opposite signs, then there must be at least
one point x = I, with a < f < b, for which /(I) = 0.
5-21 Why is it not possible to conclude from the intermediate value theorem that
if/(x) = 1/(1 - I x |) for I x I ^ 1 and /(| 1 |) = 0-5 then
(a) there is no point x = £ in the interval [0, 6] for which /(I) = 0;
(b) yet there is a point x = rj in the interval [— 11, —2] for which /(»)) =
—0-5 ? Identify the point on the jc-axis giving rise to this functional value.
5-22 The function /(x) = \x 3 — x + 2 which is defined in the interval (—00, 00)
has extrema at the points x = 1, x = — 1. Identify their nature by considering
the behaviour of the function close to these points. Are they relative or
absolute extrema ?
5-23 By considering the behaviour of f(x) = sin £x cos \x in the neighbourhood
of x — in, show that the function attains an absolute maximum at that point.
5-24 By considering the behaviour of y = x 2 — 2x + 3 in the neighbourhood of
x = 1, prove that this point gives rise to an absolute minimum of the
function. Find its value.
5-25 Find the critical points of the function f(x) = x 3 — x 2 — 4x + 4. Identify
the nature of the extrema associated with them by considering the functional
behaviour close to each of these points.
5-26 Find the critical point of the function f(x) = (x — i)x 213 and identify its
nature. Do the points x = — 1, x = correspond to extrema of the function
and, if so, of what type are they?
5-27 Find the critical points of the function f(x) = x 2 (3 — x) 2 .
5-28 Identify the critical points and extrema of the function
lx 2 -3x + 2 for < x < 2-5
U 2 - 7x + 12 for 2-5 < x <. 5.
5-29 Apply Rolle's theorem to the following functions where it is applicable, and
hence determine at how many points in the stated intervals [a, b] the following
functions satisfy the result of that theorem:
(a) /(*) = x* - 1 in [-2, 2];
(b) f(x) = 1 + sin x in [— 2w, 3*-];
(c) f{x) = 1/(1 + I x |) in [-1,1];
(d)/w = (* 2 +3* + 2for-l<;^0
U 2 - 3* + 2 for 0< x < 1.
5-30 Give an example of a simple continuous function g(x) of the type illustrated
in Fig. 5-9 (b) in which^'(f) = for some point in an interval [a, b], but to
which Rolle's theorem is inapplicable because g(x) is non-differentiable at
one point of that interval.
PROBLEMS / 261
5-31 Show that the conditions of the mean value theorem apply to /(x) = x +
sin x for the interval [0, £w]. Find the value of I in the statement of the
theorem.
5-32 In the proof of Theorem 5-12 a function F(x) was constructed on the interval
[a, b] which had the property that F(a) = F(b) = and, in addition,
satisfied the other conditions of Rolle's theorem. Repeat the proof of Theorem
5-12, but this time with the requirement that F(a) = F(b) = K, where K is
an arbitrary non-zero constant.
The following four problems illustrate how the mean value theorem may be
used to estimate the behaviour of functions in closed intervals.
5-33 Let/(x) be a differentiable function having a monotonic increasing derivative
in the interval [a, b]. Then by writing the mean value theorem in the form
f(b) = /(a) + (b - a)/'(S)> with a < f < b, prove that /(a) + (x - a)f\a)
< f{x) < f(a) + (x - a)f'(b), for a < x < b. We shall agree to say that
these inequalities define upper and lower estimates of f(x) in [a, b]. Show
also that if /'(■*) is monotonic decreasing, then the inequalities must be
reversed in the above expression.
5-34 Apply the result of Problem 5-33 to the function /(x) = sin x in the interval
[0, £tt] in order to prove that < sin xjx < 1 for < x < in.
5-35 Apply the result of Problem 5-33 to the function /(x) = (1 + * 2 ) 3/2 in the
interval [1, 2], thereby obtaining upper and lower estimates for it in that
interval.
5-36 If fix) = 1 + x + (1/5) sin 2 x, show that /'(*) is monotonic increasing in
the interval [— Jw, &*]. Hence apply the result of Problem 5-33 to/0:) to
obtain upper and lower estimates for /(x) in that interval. Evaluate the
inequalities for x = and x = i^ and compare the estimates with the exact
result.
5-37 Let the functions /(x) and^(x) be continuous in [a, b] and differentiable in
(a, b), with^'fx) non-zero in (a, b). Show that under these conditions Rolle's
theorem may be applied to the function F(x) defined by F(x) = f(a)g (a) - f{b)
g{a) + [g{a) -g(b)]f(x) - [/(a) - fQ>)]g (x), for a < x < b. Hence estab-
lish the Cauchy extended mean value theorem.
5-38 By repeatedly applying L'Hospital's rule where necessary, evaluate the
following indeterminate forms of the type 0/0:
. tan ax „, ,. xcosx-sinx
(a) lim — — ; (b) lim —5 ;
x->0 X x-»0 X
tanx- sinx /JA ,.„ x 3 - 2x 2 - x + 2_
x — sin x
x 2 — sin 2 x
, s ,. tan x — sin x ... ,.
(c) lim : ; (d) lim
(e) lim
3_o x' sin-= x
5-39 Evaluate the following indeterminate forms which are of the type co/co:
(a) lim (7r/x)/cot *x/2; (b) lim tan x/tan 5x;
(c)iim 3x2 + x ~ 1 ; (d) Hm _£!*iL_
W JHT«, x 2 + 2 ' w ^o*-cotx
262 / DIFFERENTIATION OF FUNCTIONS
CH 5
5-40 Explain the fallacy in this argument. The limit
x 2 + x sin x + sin x
lim
X— *-oo
does not exist because, applying Corollary 5-14 to L'Hospital's rule gives
,. x 2 + xsin x + s'mx ,. 2x + sin x + xcosx + cos x
lim - = hm
= lim
X—*- oo
1+1 cos x +
z—*-co
sin x + cos x
2x
= 1 + i lim cos x.
X->co
What is the true value of this limit?
5-41 Indeterminate limits of the form co — oo, . oo can be reduced to the types
0/0 or co/co by means of the following simple devices. If the limit is of the
type . oo set limf(x) = and lim £•(.*:) — >- co, then
x— *a x-*a
lim [f(x)g(x)] = lim [f{x)K\jg{x)} (type 0/0)
x-*a x-<-a
= lim [g{x)l{\lf{x))} (type co/co).
X-Hl
If the limit is of the type co — co set lim/(x) = 0, \img{x) = 0, then
lim
x-*a
"_1 1_"
= lim
x— >a
= lim
x-^a
x—*a
- g{x)-f{x) -
_ ftogV) .
}l(g(x) - flx)\
Apply these results to evaluate the following limits :
(a) lim l-X- - -) ; (b)
x ^o \sin x xj
(type 0/0)
(type co/co).
limP-- , 5 V
x-*3 \x — 3 x 2 — x — 6 J
(c) lim (1 — cos x) cot x;
x-*Q
ttX
(e) lim (1 — x) tan — ;
x^l 2
(d) lim xsin-;
X— *-oo X
(f)
i im /_iL__^y
x _j„ \^cot x 2 cos xj
5-42 Verify the nature of the extrema in Problem 5-20 using the results of Theorem
515.
5-43 Verify the nature of the extrema in Problem 5-23 using the results of Theorem
515.
5-44 Apply to Problem 5-26 the modification to Theorem 5- 15 indicated at the
end of Example 5-10 (b) to identify the behaviour of the function at the origin.
5-45 Apply Theorem 5-15 to Problem 5-26 to identify the extrema occurring in
the interval (0, 5].
5-46 If j = f(x), where/is a differentiable function, find the differential dy given
that:
(a) f(x) = x 6 + 3x 2 + x + 6;
(b) /•(*) = * sin (* 2 + 1);
PROBLEMS / 263
(d) /(*) = (1 + X 2 ) 1 /*.
5-47 Metals A and B have coefficients of linear expansion a and /S, respectively.
That is to say, when the temperature changes by an amount t from the
ambient value To, the linear dimensions of metal A change by a factor
(1 + <*t), whilst those of metal B change by a factor (1 + /3/). Suppose that a
block of metal A contains a cylindrical cavity of height Ho and radius R at
temperature To which is empty apart from a cylinder of metal B which has
height ho and radius ro at that same temperature. Obtain an approximate
expression for the small volume change d V of the cavity between the cylinders
consequent upon a small change of temperature dt.
Section 5-4
5-48 Compute the first and second derivatives of the functions /(x) listed below:
(a) fix) = tan x;
(b) f{x) = x 2 sin x;
(c) f(x) = (1 + x)(3 sin x + cos 2x);
id) fix) = (x» + I) 1 ' 2 ;
(e) fix) = sin (1 + x 2 );
(f) /foe) -tan-
5-49 Show that if fix) = |(3* 2 - 1), then
(1 - x*)f"ix) - Ixf'ix) + 6fix) = 0.
Equations of this type are called second order ordinary differential equations,
and this one is a special case of Legendre's differential equation.
5-50 If fix) = H5x 3 — 3x) and gix) = K3x 2 — 1). find the algebraic equation
connecting f\x),g \x), and/(x).
5-51 Show that the function fix) defined below is continuous and has a con-
tinuous first derivative at x = 1, but that it has a discontinuous second de-
rivative at that point :
for x < 1
for x > 1.
= (x* + x* - x + 1
' \2x 3 - x 2 + x
5-52 Use Leibnitz's theorem to evaluate the third derivatives of the following
functions:
(a) f{x) = TTx'' (b) f(x) = ( * ? ~ 1} tan x '
(c) fix) = sin 2 x; (d) fix) = x 3 sec 2x.
5-53 Apply Theorems 5-17 and 5-18 to locate and identify the extrema and points
of inflection of the following functions, using your results to determine the
gradients at the points of inflection :
(a) fix) = 2x 3 + 3a: 2 - 12* + 5;
(c) fix) = x \x - 12) 2 .
264 / DIFFERENTIATION OF FUNCTIONS CH 5
5-54 Use the mean value theorem to prove that if /(*) has a maximum at x = xo,
then near to xo, f"{x) < 0. Show that if f(x) has a minimum at x = xo, then
near to xo, /"(*) > 0. Hence show that these tests may be used to identify
maxima and minima, even when/'(xo) does not exist.
5-55 Apply the results of Problem 5-54 to prove that the function fix) =
(3x — l)x 2 ' 3 has a maximum at the origin.
5-56 Determine the values of a and b in order that f(x) — x 3 + ax 2 + bx + 1
should have a point of inflection at x = 2 at which the gradient of the
tangent to the graph is —3.
Section 5-5
5-57 Compute the derivatives/* and// given that:
(a)/(x,j) = x 2 jy;
(b) f(x, y) = 3x 2 y + (x + y) 2 x + 1 ;
(c) f(x,y) = sin (.x*+y z );
(d) f{x, y) = x cos (1+ x 2 y 2 ).
5-58 Given that
fix, y) = x 3 + 3x 2 y + 4xy 2 + 2y 3
prove that xf x + yfy = 3/.
5-59 Compute the derivatives f x , fy, fz given that:
(&)f(x,y,z) = x 2 yz + —j
(b) /(x, y, z) = x cos yz + y cos xz + z cos xy;
(c) f{x, y, z) = cos (x 2 + xy + yz).
5-60 Show that if
fix, y, Z) = ( x2 + yS + z 2)3/2
then xfx + yfy + zf z = —If.
5-61 Show that if
fix,y,z) = x + j-l
then/* +/„+/*= 1.
5-62 Show that if
f= ix - y)iy - z)iz - x)
then/* + /*,+/* = 0.
Section 5-6
5-63 Find the total differential d« given that u = f(x,y, z), where:
ia)fix,y,z) = ^- z +xyz;
(b) fix x y,z) = x sin iy 2 + z 2 );
(c) fix, y, z) = (l -x 2 -y 2 - z 2 ) 3 ! 2 .
PROBLEMS / 265
5-64 The speed of wave propagation u in a transmission line with inductance L
and capacitance C is given by the equation u = (LC)- 1 ' 2 . Relate the differ-
ential du to the differentials dL and AC. How must dL and dC be related if
u is to remain constant ?
5-65 Apply the triangle inequality |a + 6|<|a| + |6|to establish that, if
u =f(xi, X2 x n ) is differentiable with respect to each of its independent
variables xi, x%, . . ., x„, then
d«| <
Sxi
dxil +
V
8x2
dx 2 | +
«/
fan
dx„
A triangle with sides of length a, b, c has area A = y/[s(s — a){s — b)(s — c)],
where 2s = a + b + c is the perimeter. If s is kept constant, find the largest
possible value that may be assumed by | dA |, the absolute value of the area
differential dA, consequent upon changes in the differentials da, db, and dc.
Apply the result to an equilateral triangle in which a = b = c = 4, when
changes da = 001, db = 0-015, and dc = -0025 are made.
5-66 Compute dyjdx from the following implicit relationships:
(a) x 2 + y 2 = 4;
(b) x sin xy = 1 ;
(c) x 2 y + 2xy 2 + y 3 = 2.
5-67 Compute dzjdx and Szjdy given that:
(a) x 2 + y 2 + z 2 = 1 ;
(b) xyz + sin xz 2 = 2;
(c) x 2 - 2y 2 + 3z 2 - yz + y = 0;
(d) x cosy + y cos z + z cos x = 1.
Section 5-7
5-68 Find the envelope of the family of curves with parameter a
(x - a) 2 +y 2 = a 2 /2.
5-69 Find the envelope of the family of curves with parameter a
. 3
y=txx + —
2a
5-70 When a particle is projected into the air with velocity V at an angle 6 to the
horizontal then, neglecting air resistance, its height y when distant x from the
point of projection is given by
v = x tan d - „ T ,f* „ ■
y 2F 2 cos 2
By regarding 9 as a parameter, show that the envelope of the family of
trajectories for < 6 < n is a parabola, and find its equation. This is usually
called the parabola of safety because no projectile can penetrate beyond it.
5-71 Find the envelope of the family of curves with parameter a specified by
(JC - a) 2 + (y + a) 2 - a 2 = 0.
5-72 Show that the envelope of the family of curves with parameter a denned by
x cos a + y sin a = 2
is a circle. Find its centre and radius. Interpret this family geometrically.
266 / DIFFERENTIATION OF FUNCTIONS
CH 5
Section 5-8
5-73 Find du/dt given that:
(a) u = xy + sin (x 2 + y 2 ) with x = It, y = (1 + t 2 ) 1 ' 2 ;
(b) u = (1 + x 2 + y 2 ) 3 ' 2 with x = t(l+ t), y = t 3 ;
(c) « =
with x = 3 cos f, y = 3 sin t, z = f 2 .
(x 2 + j- 2 ) 1 ' 2
5-74 If u = x 2 — xy + y 3 , compute du/dt at points on the curve specified para-
metrically by the equations x = 2t + 1, y = t 2 + t — 2.
5-75 Prove that if u =f(2x 2 + y 2 ), where /is a differentiable function, then
du Bit
y- 2x — = 0.
J Sx 8y
[Hint: Set t = 2x 2 + y 2 .]
5-76 If « =f(x,y), compute du/dx given that:
(a) f{x, y) = (1 + xy + x 2 ) where y = tan (-) ;
(b) f(x,y) = (1 + x 2 — y 2 ) 3 / 2 where y = cos 3x;
(c) f(x, y) = x cos y + y cos x — 1 where _y = 1 + sin 2 x.
5-77 If u = f(x,y) and g(x, y) = are differentiable functions, compute dw/dx
given that:
(a) f{x, y) = x 3 + 3xy + y 3 and g{x, y) = x cos _y 4- y cos x — 2;
(b) f(x,y) = x 2 j 2 + sin xy and^(x, j) = x 2 — 2j 2 — 3.
5-78 If u = x 2 — xy + y 2 , determine du/dx at points on the ellipse 2x 2 + 3y 2 = 1.
Section 5-9
P(r,0, V )
PROBLEMS / 267
5-79 In spherical polar coordinates a point P is specified in space by giving the
ordered number triple (r, <p, 0). Here r is the radial distance of P from the
origin, <p is the azimuthal angle of P measured anti-clockwise from the
x-axis in the (x, j)-plane, and is the acute angle between the radius vector
drawn to P from the origin and the z-axis. (See Figure.)
It is easily seen that :
x = r sin 6 cos <p,
y — r sin sin <p,
z = r cos 0.
Uf(x,y, z) is differentiable with respect to x, y, and z, express df/Sr, 8fj8d,
and 8f\8<p in terms of 8f/8 x , dfjdy, and 8f/8z. Find their values given that
fix, y, z) = x 2 + 2xy + yz + z 2 .
5-80 Given that/Or, y, z) = x 2 + xy + sin yz, compute Bf/8 r , 8fj8d, and Bfjdz,
where (r, 6, z') are the cylindrical polar coordinates corresponding to the
point (x, y, z).
5-81 The notion of a Jacobian extends to transformations involving more than two
variables. If, in Theorem 5-22, m = n = 3, the Jacobian or functional
determinant is
Bxi dX2 8x3
BJXI, X2, xj)
9(«i, <*2, a 3 )
dai 8ai Sai
8xi 8x2 8x3
8a.2 8a.2 80.2
8x1 8x2 8x3
80.3 80.3 801.3
Evaluate the Jacobian 8{x, y, z)\8{r, 6, z') for the transformation from
Cartesian to cylindrical polar coordinates.
5-82 Use the definition in Problem 5-81 to evaluate the Jacobian 8(x,y,z)j
s i r , V, e ) for the transformation from Cartesian to spherical polar co-
ordinates.
5-83 Find the Jacobians of the following transformations, stating where, if at all,
they vanish:
(a) x = 2« + 3v + 1, y = 3m - 2v - 1;
(b) x = u 2 - v 2 ,y = u 2 + v 2 ;
(c) x = u 2 + 2uv + v 2 , y = u.
5-84 Use Theorem 5-22 with n = 2, m = 3 to determine 8fj8u, 8f/8v, and 8fj8w,
given that:
f=x 2 + iy 2
where x = u 2 + v + w and y = uvw.
5-85 If u and v are functions of x and y which satisfy u 2 — v 2 + 2x + 3y = and
uv + x - y = 0, find 8u\8x, 8u\8y, 8vj8x, and 8vj8y in terms of u and v.
5-86 Prove that if z = /(«, v), where u = x + 3t, v = y — It, then
— - -\—- 1 —
8t 8x 8y
268 / DIFFERENTIATION OF FUNCTIONS CH 5
5-87 Show that if u = \\r n , where r 2 = x 2 + y"- + z 2 , then
£!ff £!ff 82 " - "(" ~ *)
8x 2 8y 2 8z 2 ~ r n+2
5-88 Prove that if u = 2xy + xfiy/x), then
Su du
x Tx + yT y = u + lx y-
Section 5- 10
5-89 Which of the following implicit functions /(x, y) — may be solved explicitly
for y in the neighbourhood of the stated points (xo, yo) :
(a) f(x, y) = x 2 + y 3 + xy - 11 at (1, 2);
(b) f(x,y) = (l-x 2 - y 2 y> 2 at (-1, 0);
( c ) /(*>)>) = sin xy - l at (1, i^);
(d) f{x, y) = y + sin xy - 2 at (i», 1) ?
5-90 Compute dx/dy for each of the following relationships :
(a) y = 1 -f x 2 + x sin x;
(b) y = (1 - x + x 2 yi 2 ;
(c) j = x + tan x.
5-91 Differentiate these functions:
(a) /(x) = x 2 arc sec (x/a);
(b) fix) = (x 2 + x + l)/arc sin (x 2 - 2);
(c) /(x) = (1 + x + ar.c cos 2x) 3 / 2 .
5-92 Compute dy/dx and d 2 //dx 2 for each of the following parametrically defined
curves:
(a) x = t - 1, y = t 3 ;
(b) x = cos 3 /, y — 2 sin 3 1 ;
(c) , = arc cos _I_ , = arc sin ^^-^
(d) x = 2(cos t + t sin 0, y = 2(sin f — t cos f).
5-93 Compute dy/dx and d 2 j/dx 2 at f = |w if x = / — sin t and j = 2(1 — cos /).
5-94 Compute d 3 yjdx 3 when t = 1, given that x = 2f + 1, y = f(l + t 2 ).
5-95 In Example 5-21, an envelope is specified in terms of a parameter a, and it
comprises two curves corresponding to the + and — signs associated with y.
Find the gradient of each of these curves at the origin (that is, corresponding
to a = 0).
Section 511
5-96 Compute 8 2 z\8x 2 , 8 2 z\8x8y, 8 2 z\8y8x, and 8 2 z/8y 2 for each of the following
functions and hence show that 8 2 zj8x8y = 8 2 z\8y8x :
(a) z = (x 2 + y 2 yi 2 ;
(b) z = x cos y + y cos x;
(c) z = arc tan (y/x).
5-97 Compute /c*(l, l),/ty(l, 1), and/^,(l, 1) given that
fix,y) = U+x)Hl+y) 3 .
PROBLEMS / 269
Is 8 2 f/8x8y = 8 2 fj8y8x1 Give reasons for your answer.
5-98 Given that
f(x,y) = * 2 + j 2
{ 1 for x = 0, y =
compute 8 2 f\8x8y stating, with reasons, when it is equal to 8 2 fj8y8x. Is there
any point at which this result is not true and, if so, what property of the
function invalidates the result? [Hint: Consider limits taken along the line
y = mx.]
5 99 Show that if w = arc tan (x/y), then 8 2 w/8x 2 + 8 2 w\8y 2 + 8*w\8z 2 = 0.
5-100 Given that V = arc tan 2xy\{x 2 — y % \ prove that
8V 8V 8 2 V 8 2 V
5-101 Compute 8 a zl8 x 8y 2 and 8 3 z/8x 2 8y given that z = x A y 2 + sin x 2 y.
Exponential, hyperbolic,
and logarithmic functions
6 -1 The exponential function
This chapter will be concerned primarily with the exponential function,
first introduced in connection with limits in Section 3-3 and, thereafter, with
a number of related functions. This time our approach will be to utilize both
geometrical ideas and the elementary calculus to produce a more useful form
of definition than that contained in Eqn (3-6).
Let us seek a function E(x) equal to its own derivative and such that
£(0) = 1. Specifically, we must solve the equation
E'(x) = E(x) (6-1)
which, because it involves the unknown function E{x) together with its
derivative, is called a differential equation. This differential equation has the
following simple geometrical interpretation : if the graph of the function E(x)
is drawn, then the gradient of the graph at the point (x, E(x)) is equal to the
functional value of E(x) itself.
Perhaps it is worth remarking that Eqn (6-1), taken together with the
condition E(0) = 1, immediately implies that E(x) is a convex function for
x > 0. No deduction can yet be made about its behaviour for x < though,
in fact, we shall shortly prove that E(x) is a convex function for all x.
As on previous occasions, our desired result is soonest obtained by
studying an artificial function. The reason for considering the precise form
of function to be used will become apparent once the result has been obtained.
Suppose, for the moment, that there is a unique function E{x) defined by
our requirements, and consider the new function F(x), where
F(x) = E(x)E(a - x). (6-2)
Then,
F'(x) = E(x) — [E(a - x)] + E(a - x) ~ [E(x)] "
which, using the defining property (6T), becomes
F'(x) = -E(x)E(a -x) + E(a - x)E(x) = 0.
Consequently, F{x) = constant but, as F(0) = E(0)E(a) = E(a), it follows
at once that F(x) = F(0) = E(a) for all x, and thus Eqn (6-2) takes the form
E(x)E(a - x) = E(a).
SEC 6-1 THE EXPONENTIAL FUNCTION / 271
Alternatively, by replacing a by a + b and x by b this may be written
E(a + b) = E(a)E(b). (6-3)
Hence, if « is a positive integer,
E(n) = E(n - 1)2(1) = E(n - 2)(£(1))2 = • • • = (£(1))». (6-4).
If, now, we denote £(1) by the symbol e, then Eqn (64) is equivalent to
E(n) = e». (6-5)
The fact that £(0) = 1 taken together with Eqn (6-1) implies £(1) > 1,
also implies, via Eqn (6-5), that lim e n -> oo.
n-»oo
Again,
£(-«)£(«) = £(0) = 1,
so that
£(-») = -=rr-'~ — = e-». (6-6)
v ' £(«) e»
Now we must extend this notation to take account of rational and
irrational x. Let us consider E(x) for rational x, so that x = pfq with p, q
integers. Then, using Eqn (6-5), we may write
['<£)]' -'®-«*-'-
and so
E (?\ = &«. (6-7)
A similar argument using Eqn (6-6) shows that
£ (^i\ = q-p'i. (6-8)
Thus we have shown that for all rational x
E(x) = e*. (6-9)
To extend the definition of E(x) to all the real numbers x and not just to the
rationals, it only remains to add that for any irrational number f , we define
£(£) by the equation E(t-) = e { .
Although the foregoing arguments have established the algebraic properties
of E(x), they have still not provided a method of attributing an actual number
to E(x) for any given value of x. Nor, indeed, are we certain that only one
function E(x) exists that satisfies Eqn (61) and is such that £(0) = 1; that
is to say, is E(x) unique? This question will be answered in the affirmative
272 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
immediately following the next stage of our argument.
We now seek a series solution to our function E(x) of the form
y = 2a rX r (6-10)
where, for simplicity, we have set y = E(x) so that Eqn (61) now becomes
Ay
with/(0) = 1.
Assuming that this infinite series may be differentiated termwise, we have
dv "
/ = 2 ra r xr~\
ax r =
so that substituting for y and dy/dx in Eqn (6- 11) yields
00 00
2 ra r x r ~ l = 2 a r x r
r=0 r=0
or, equivalently,
J (r + l)a r +ix r = f a r x r . (6-12)
r=0 r-0
For this result to be unconditionally true for all x, as it must be to satisfy
our definition of E(x), it follows that it must be an identity in x. This can
only be possible if the coefficients of the corresponding powers of x on each
side of Eqn (6-12) are identical. Hence, equating the coefficients of the
general term involving x r , we find that
(r + lK+i = a r (6-13)
for r = 0, 1, 2
As we require that j(0) = 1, it follows by setting x = in Eqn (6-10)
that ao = 1. Using this result together with Eqn (6-13), which defines the
coefficients a r recursively, it is easily seen that
1 1 *
ao = 1, ai = 1, a z = — , as = — , . . ., a r = —
2! 3! r!
Substitution of these coefficients into Eqn (6- 10) then shows that
E(x) = l+x + - + - + ■■■ + - + ■■ ■ (6-14)
whatever this expression may mean.
We have already remarked that the sum of ah infinite series is to be
interpreted as the limit of the partial sums of the series, so let us now consider
the nth partial sum
SEC 6-1 THE EXPONENTIAL FUNCTION / 273
X
■n-1
5„=l + , + - + --- + (T - Tyi (6-15)
of the function E(x).
If x > then S n +\ - S n = x 1l jn\ > 0, so that {S n } is increasing. Is {S n }
bounded ? Let R be an integer greater than 2x, then x\r < J for r > R, and so
x r x x xx x x
■R-l
<77f-^,(i)
,r-R+l
r\ 12 R-l R r (R - 1)!
Thus
fl-l yF " _1 v r X R ~ l n ~ l
r% r\ r = Rr\ {.R ~ l)!r = fl
which shows that {S„} is bounded. Hence by the postulate of Section 3-2 it
follows that lim S n exists, and we now define the sum of the infinite series
n-»oo
(6- 14) to be equal to the value of this limit. The infinite series (6- 14) is thus
defined for all positive x.
As we have agreed to write £(1) = e, it follows from Eqn (6-14), by
setting x = 1, that
e = 1 + 1+ 2l + 3l + ,, ' + ^ + '"' (616)
which, to 15 decimal places, has the numerical value
e = 2-718281828459045.
A modified argument shows that E(x) is also defined for all negative x,
so that taking account of Eqn (6-9) we have proved the following result:
theorem 6-1 (exponential theorem) For all x it is true that if
v l
e= 2 -{
»=o n\
then
CO yfl
e*= 27
Let us now dispel any lingering doubts there may be about the uniqueness
of e*. Suppose there is a different solution z = E(x) of Eqn (6-1), with
z(0) = 1. Then we must have
274 / EXPONENTIAL, HYPERBOLA, AND LOGARITHMIC FUNCTIONS CH 6
dz
d"* = Z (6-17)
and so, differencing Eqns (611) and (6- 17), it is easily shown that
dw
5J-». (6-18)
where w = y — z. We also have w(0) = y(Q) — z(0) = 0. Now solving
Eqn (6-18) by the same device as before, but this time setting
w = f i b r x' r , (6-19)
we arrive at the recurrence relation
(/• + l)b r +i = b T , (6-20)
for r = 0, 1, 2, . . ., which is strictly analogous to Eqn (6-13).
However, setting x = in Eqn (6- 19) and using the condition w(0) =
we find that bo = 0, and so it follows from Eqn (6-20) that all the coefficients
b r are zero. Hence from Eqn (6- 19) we see that w(x) = 0, and thus y = z,
showing that the function e* defined by Eqn (6- 14) is unique.
Finally, it remains for us to establish the equivalence of the function
E(x) defined by Eqn (3-6) and the one denoted by the same symbols in Eqn
(6-14). We shall only give the details for positive x. Our best method is first
to expand Eqn (3-6), obtaining
Then, setting E n +i = [1 + (*/«)]", we rewrite the result in the form
*«-'♦»+ *KKK)K)+--
+£K)H)-('-^)<->
Defining the number g(r, n) by
*«4-3(-3-('-3-
we next write Eqn (6-21) as
E„+i = 1 + x + |^(1, n) + |jg(2, «) + ••• + ^g(n - 1, n). (6-22)
Now the difference S n +i — E n +i is
SEC 6-1 THE EXPONENTIAL FUNCTION / 275
Sn+1 - E n+1 = - (1 - g(\, „)) + L (1 _ g(2, «)) + •••
which is obviously positive since < g(r, n) < 1.
However, it is readily seen that for any given r
limg(>,n) = 1,
showing that
lim (S n +i — E n +i) = 0.
n-»a>
From Theorem 3-1 (a) it then follows that
lim E n +i = lim S n +i — e*
n-*oo n-*-oo
thereby establishing the equivalence of our two alternative definitions when
x is positive. A similar argument also establishes the equivalence when x is
negative.
Having now achieved a working definition for E(x) we shall henceforth
always denote this function, known as the exponential function, either by e x
or by exp (x).
It is worth formally recording the differentiability properties of this
function t x . However, we first remark that if fix) — e» te ', where g(x) is a
differentiable function of x, then, setting g(x) = u so that f(x) = e" and
using the chain rule in the form displayed in Eqn (5-6), we find that
d/" df dw
Tx = Tu-Tx = eUg ' (x)==8 ' (x)egixK
theorem 6.2 If f(x) = e"<*>, where g(x) is a differentiable function of x,
then
d
— {e» ( *>} = g'(x)e<>< x) .
In particular, if g(x) = a.x, where a is a constant, then,
— (e ax ) = ct.e x .
dx
Let us now establish an important property of e x . Consider the quotient
eP/xP, where p is any positive integer. Then from Eqn (6-14) it follows that,
x 2 x? xv +1
e* 2J p\ (p + 1)! x
xP~ xv > (p + l)f
276 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS
CH 6
Hence we have shown that
x
e*
hm — > lim
->■ 00.
r -*oo*l> ^oo (p + 1)!
We have proved the following result:
theorem 6-3 The function e* increases more quickly than any positive
power of x as x -»■ oo.
We have already noted that lim e* -*■ oo, and as e* = 1/e - * it follows
X— *oo
that lim e* =0 or, equivalently, lim er x = 0. From Theorem 61 it follows
ai-»-oo as-*oo
that the function e* is everywhere positive and since, by virtue of its definition,
its derivative is everywhere a strictly monotonic increasing function of x it
must be a convex function. A graph of e* is shown in Fig. 61.
These last properties are frequently of help when studying limiting prob-
lems involving the exponential function, as illustrated in the following
examples.
-2 -1
Fig. 6-1 The exponential function
Example 6-1 Deduce the values of the following limits:
3e* + jc 3 + 1
(a) lim 7 ,
x~* oo +£?' "T" -*
/m r 2e2* + x* + 2 .
0»]™ 3 e3* + 7 '
(c) lim —
SEC 6-2 DIFFERENTIATION OF EXPONENTIAL FUNCTION / 277
Solution (a) We have
3e* + x 3 + 1 3 + (x»/e*) + (1/e*)
2e* + x 7 ~ 2 + (x 7 /^)
and from Theorem 6-3 it then follows that all but the initial terms in numera-
tor and denominator must vanish as x -> oo, so that
,. 3e* + x 3 + 1 3
hm — — = -•
_■ 2e* + x 7 2
(b) In this case we have
2 e 2* + x 2 + 2 _ 2e-* + (x 2 /e 3 *) + (2/e 3 *)
3c 3 * + 7 3 + (7/e 3 *)
However, this time as x -*■ oo so all the numerator tends to zero whilst the
denominator approaches the value 3. Hence we have
,. 2e 2 * + .*2 + 2 n
hm — = 0.
«_„ 3e 3 * + 7
(c) This limit involves an indeterminate form of the type 0/0, so we
appeal to Theorem 5-14. Writing/(x) = e"* — e 6 * and g(x) = 2x we see that
/(0)= <? (0) = 0,and
x-*o g (x) x -*o 2 2
Hence, by the conditions of Theorem 5- 14,
Qttx _ e bx ae ax _ fcbz a — b
lim — = lim = — - —
x~*0 2x x-*Q 2 2
6 - 2 Differentiation of functions involving the
exponential function
The exponential function occurs frequently in mathematics, and all of its
differentiability properties follow from Theorem 6-2 combined with the
fundamental differentiation theorems of Chapter 5. These results are straight-
forward and are best illustrated by examples. The first example illustrates the
ordinary differentiation of simple combinations of functions.
Example 6-2 Differentiate the following functions /(x):
(a) fix) = 2x2 + 3e2* ; (d) fix) = c«*/(l + e*) ;
(b) fix) = x*&*\ (e) /(x) = sin (1 + e*).
(c) f(x) = 2 exp (x 3 + 2x + 1);
278 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
Solution
(a) f{x) = ^ (2x 2 + 3e 2 *) = 4x + 3 4 (e 2 *)
dx dx
and so
/'(x) = 4 + 6e 2 *.
(b) /'(*) = e 3 * 4- (x 2 ) + x z 4 (e**)
dx dx
so that
/'(*) = 2xe 3 * + 3x 2 e 3 *.
(c) This is a more complicated example of a composite function or, more
simply, of a function of a function. Set u = x 3 + 2x + 1 so that
/(*) = 2e«.
Then, by the chain rule,
r(x) = v.*a.
JW du dx
but
df d dw
-^ = — (2e M ) = 2e" = 2 exp (x 3 + 2x + 1) and — = 3x 2 + 2
du d« dx
so that, finally,
fix) = (6x 2 + 4) exp (x 3 + 2x + 1).
(d) Writing /(x) in the form
f(x) = e 2 *(l + e*)- 1
we have
/'(*) = (1 + e*)- 1 ^ (e 2 *) + e 2 * £ [(1 + e*)" 1 ]
or
w-otV^k 1 *^-
To evaluate the last term set 1 + e* = u, so that we then need to evaluate
dx\u/
which, by the chain rule, is
SEC 6-2 DIFFERENTIATION OF EXPONENTIAL FUNCTION / 279
_d_
dx
C-)= ± (-)■-■
\u] du \uj dx
However, dujdx = e* and (d/d«)(l/M) = — (1/w 2 ) = —1/(1 + e*) 2 , showing
that
d — e*
[(1 + e*)-i] =
dx ' J (1 + e*) 2
Hence, combining our results, we find
2t 2x c 3x
f'(x) =
(1 + e*) (1 + e*) 2
(e) This is another composite function. Set u = 1 + e 2x , so that f(x)
= sin u. Proceeding as before we then see that
/'(*) = ~ ■ jp = 2e 2 * cos (1 + e 2 *).
Higher order derivatives are defined, as usual, by repeating the differentia-
tion process the requisite number of times.
Example 6-3 Find/"(x), given that:
(a)/(x) = x2e-2*;
(b) /(*) = (x- l)e*.
Solution (a) Proceeding as before we find that
f'(x) = 2xe-2* - 2x 2 e~ 2x ,
and
f"(x) = 2e- 2 * - 4xe- 2x - 4xe- 2 * + 4x 2 e-2*.
Collecting terms we obtain
f"(x) = 2(1 - Ax + 2x 2 )e-2*
(b) f'(x) = e* + (x - \)e x = xe x
so that
f"(x) = e* + x&.
Partial differentiation of functions involving the exponential function is
also straightforward, as the following example indicates.
Example 6-4 Determine yi, f y , andf xy , given that
fix, y) = (x 2 + J 2 ) exp (*2 _ j2) .
280 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
Solution
y x = lx exp (x 2 - y 2 ) + (x 2 + j 2 ) y x [exp (x 2 - y 2 )]
= 2x exp (x 2 - y 2 ) + 2x(x 2 + j 2 ) exp (x 2 - y 2 ).
Notice that dfjdx comprises the sum of everywhere continuous functions
and so is itself everywhere continuous.
j = 2y exp (x 2 - y 2 ) + (x 2 + y 2 ) y [exp (x 2 - /)]
— 2y exp (x 2 — y 2 ) — 2y(x 2 + y 2 ) exp (x 2 — y 2 ).
The partial derivative dfjdy is also seen to be everywhere continuous. Theorem
5-24 now tells us that dfjdxdy = dfjdydx, so that we may differentiate either
dfjdx or dfjdy to arrive at f xy . We choose to differentiate f x partially with
respect to y.
8 2 f
= —4xy exp (x 2 — y 2 ) + 4xy exp (x 2 — y 2 )
dydx
— 4xy(x 2 + y 2 ) exp (x 2 — y 2 ),
whence
* — = -4xj(x 2 + y 2 ) exp (x 2 - y 2 ).
dxdy dydx
As a final illustration, let us consider an application of Theorem 5-21 to
the exponential function.
Example 6-5 Find dfjdt, given that
f(x, y) = xy exp (x 2 + 3/ + 1),
with x = sin /, y = t z + 1.
Solution Here we must use the chain rule formula for partial differentiation :
6f = dJ_ dx df dy
dt dx' dt dy' dt
Now
■£ = y exp (x 2 + 3j + 1) + xy — exp (x 2 + 3y + 1)
= y exp (x 2 + 3j + 1) + 2x 2 7 exp (x 2 + 3y + 1),
and thus
SEC 6-3 THE LOGARITHMIC FUNCTION / 281
%-=y(l + 2x 2 ) exp (x 2 + 3y + 1).
Similarly,
8f 8
— — x exp (x 2 + 3>> + 1) + xy — exp (x 2 + 3y + 1)
dy 8y
= x exp (x 2 + 3y + 1) + 3xy exp (x 2 4- 3y + 1),
and thus
^ = x(l + 3y) exp (x 2 + 3y + 1).
We also have
dx dv
— = cos f and -f- = 3? 2 ,
dt dt
and so df/dt may now be found by direct substitution into the chain rule
formula, with the following result:
df
-I = [(/3 + i)(! + 2 sin 2 1) cos t + 3f 2 (3? 3 + 4) sin t]
dt
X exp (4 + 3t 3 + sin 2 0-
6-3 The logarithmic function
Having introduced the exponential function there is now a need for an inverse
function. The implicit function theorem (Theorem 5-23) tells us that such an
inverse function exists and, furthermore, that it is differentiable whenever
(d/dx)(e s ) # 0. However, this is always the case since we have already seen
that (d/dxXe*) = e x , which is never zero for x in the interval — oo < x < oo.
Hence a differentiable function, inverse to the exponential function, exists for
all x. We call it the natural logarithmic function and denote it by log e whenever
it is necessary to indicate that it has the base e.
definition 6T We define the natural logarithmic function log e x by the
requirement that
y = loge x o x = e y .
We may use this definition, together with Corollary 1 to the implicit
function theorem, to compute the derivative of log e x. As dy/dx = l/(dx/dy)
and x = e*', it follows that dx/dy = &>, whence
dy_ j__ 1
dx & x
282 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
Now e*' is essentially positive, so that
d 1
— (loge x) = - for x > 0. (6-23)
It is obvious that log e 1 = and, as x increases strictly monotortically
with y, it also follows that loge x— *■ + oo as x -> + oo, and log e x -*■ — oo as
x-*0.
Let us now prove that
lim ^^ = for all a > 0.
x->ca -*
As x = e" we have
]ogeX = J_
and so
lim !°iii = lim Z _ I i im 21.
x— >oo x y— *oo e * oc y—+oa e y
Setting h = ay we arrive at
,. log e X 1 U
lim g = - lim — = 0,
z—><X) % * «— >oo C
by virtue of Theorem 6-3.
Collecting the previous results we arrive at the following theorem.
theorem 6-4 If j = loge x, then
dy 1
(a) -f = " for x > 0;
dx x
(b) lim ^iL? = for all a > 0.
Logarithms to other bases can be used if convenient. They are defined as
follows.
definition 6-2 We define the logarithmic function to the base c, denoted
by loge x where c is a positive number, by the requirement that
y = loge x o x = c«.
For reference purposes we record the following familiar properties of the
logarithmic function, established in elementary courses.
SEC 6-3 THE LOGARITHMIC FUNCTION / 283
Basic properties of the logarithmic function
Let loge and log c represent logarithms to the bases e and c respectively, and
a, b, r be real numbers ; then :
(a) loge ab = loge a + \og e b;
(b) log e a r = r loge a;
(C)1 ° gca = ioiel ;
(d) log c e =
logec
Results (c) and (d) quoted above are immediately useful if it is necessary
to differentiate loga x. For we have
, loge x
logo X =
loge a
so that
al (,0&,x) = k^'dl (l0geX)
whence,
d ., . 1 logae
— (log* x) = —. = -2— (6-24)
ax x loge ax '
Let us now find the derivative of the function a x , where a is any positive
number. Notice first that, by. virtue of Definition 6-1,
so that
a x — (^og e a\x _ gSlogea^
Now loge a is simply a constant, so we have
(a*) = — (e x loge •) = loge a e* loge a = a* loge a.
We have thus established the useful result
d
— (a*) = a* log e a. (6.25)
This result can also be obtained in another manner. We set
so that taking the natural logarithm gives
284 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
loge y = X loge a.
Differentiating this result with respect to x we obtain
^(logej) = ^(*log e <0
or
1 Ay
- ■ -f = loge a,
y dx
and so
dv d
— = — (a x ) = y loge a = a x log e a.
For our final general result we consider the differentiation of the function
y = log e g(x), where g(x) is a differentiable function. Setting u = g(x) so
that y = loge u and using the chain rule gives
dy = dy du = l_
dx du dx u
so that, finally,
|;[.og. ? WJ=||- (6-26)
Henceforth, unless otherwise stated, the natural logarithm will always be
used, so for simplicity of notation we shall write log in place of log e . Often, in
other texts, the notation In is used to denote the natural logarithmic function.
Let us now examine some representative cases of limits involving
logarithms.
Example 6-6 Evaluate the following limits :
(a) lim fc 3 ;
x— *oo X
log a x
(b) lim b with a > 0;
frtlim l+* 3 log P + (!/*)] .
log (1 + 3*)
(d) lim — t; — •
x-*0 *-X
SEC 6-3 THE LOGARITHMIC FUNCTION / 285
Solution (a) We have
log x 3 3 log x
x x
so that by Theorem 64 (b) it follows at once that
lim fe- = 0.
X— *oo X
(b) We have
log a x x log a
3x + 1 - 3x + 1
and so
li m lo 8 a:r _ lim * lo g a = i. ,
X— >oo -JX "I 1 a:—* oo ^-^ i A J
(c) Using the result
1 + x 3 log [2 + (1/x)] _ (1/x 3 ) + log [2 + (1/x)]
3x 3 + 2x 2 + 1 3 + (2/x) + (1/x 3 )
it is at once apparent that
lim l + *3 log[2 + (1/x)]=1
3x 3 + 2x 2 + 1 3 B
(d) This is an indeterminate form of the type 0/0. It is easily verified that
Theorem 514 (L'Hospital's rule) is applicable so that
x-.0g(x) x^0g(x)
with f{x) = log (1 + 3x) and gix) = 2x. As /'(*) = 3/(1 + 3x) and g'(x)
= 2 it thus follows that
lim log(l + 3x) = lim 3 = 3
x~o 2x x ^o 2(1 + 3x) 2
Example 6-7 Determine the derivative dy/dx for each of the following
functions / = fix) where :
(a)/(x) = log(3x2 + 2);
(b) fix) = log tan 2x;
(c)/(x) = 3*x2;
(d) fix) = (sin xy.
286 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
Solution (a) Here we must apply Eqn (6-26), with g(x) = 3x 2 + 2. As
g'(x) = 6x it follows at once that
d [log (3x2 + 2)] 6X
dx L D v " 3x 2 + 2
(b) Again we must use Eqn (6-26), but this time with g(x) — tan 2x.
As g'(x) = 2 sec 2 2x, we have that
d 2 sec 2 2x
— (log tan 2x) = = 2 sec 2x . cosec 2x.
dx tan ix
(c) We have
— (3*x 2 ) = 3* — (x 2 ) + x 2 4- (3*)
dx dx dx
which, by virtue of Eqn (6-25), becomes
— (3^ 2 ) = 2x . 3* + x 2 3* log 3
dx
giving
— (3*x 2 ) = (2x + x 2 log 3)3*.
dx
(d) We set y = (sin x) x and take logarithms to get
logj; = xlog sin x.
Now, differentiating, we find that
1 d J , • d /, • N
- • — = log sin x + x — (log sin x)
y dx dx
or
dy
— = (sin xWlog sin x + x cot x).
dx
Partial differentiation involving the logarithmic function is equally
straightforward. The final example illustrates a typical situation.
Example 6-8 If u = x log [1 + (x/y)] + y log [1 + (y/x)], show that
8u 8u
x 1- y — = m.
dx f dy
Solution We start by computing dujBx. It is readily seen that
SEC 6-4 HYPERBOLIC FUNCTIONS / 287
du
8.
^iog(i + ^ + ,-i.og(. + j;) + v£io g (i + i)
= , °s( 1+ ;) + ^T1^7 + - , '-TT^(^
and so
du , / , x\ x
y] x + y x(x + y)
The symmetry of x and y in u then allows us to interchange x and y in the
above partial derivative in order to derive du\dy without further calculation.
We obtain
8u i /, , y\ , y x 2
-«. log (l + >) + -
y \ xj x
8y \ x) x + y y(x + y)
Hereafter, direct substitution verifies that
du du
X ¥x +y Jy = U -
6-4 Hyperbolic functions
It is useful to define new functions called the hyperbolic sine, written sinh x,
and the hyperbolic cosine, written cosh x, which are related to the exponential
function. This is achieved as follows.
definition 6-3 (hyperbolic functions) For all real x we define sinh x and
cosh x by the requirement that
t x —r e~ x e* + e~ x
sinh x = , cosh x =
2 2
It is an immediate consequence of the series for e x and e~ x that
x 3 x b x l x 2m+1
slnhx=x+ _ + _ + _ + ... +__+..., (6 . 27)
and
r 2«
coshx=1+ _ + _ + _ + . ..+_+.... (6 . 28)
Furthermore, it also follows from Definition 6-3 that sinh x is an odd function
and cosh x is an even function.
We now define the hyperbolic tangent, cotangent, cosecant, and secant,
denoted by tanh x, coth x, cosech x, and sech x, as follows.
288 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
EFINITION 64
sinh x
tanh x = — - — ;
cosh x
cosh x
coth x = -— — ;
sinh x
cosech x = ^— — :
sech x = — - —
sinhx' cosh x
We illustrate how useful identities may be established directly from
Definition 6-3. Let us prove that
sinh a cosh b + cosh a sinh b = sinh (a + b).
Substituting for sinh a and cosh b from Definition 6-3 we obtain
e a g-a e 6 _|_ e -6 ga _j_ g-a g& g-6 g(a+6) — e -(a+6)
2 2 + 2 2 = 2 '
which proves our result since [e (a+ « — e- (a +«]/2 = sinh (a + b). Similar
manipulation establishes the validity of all the identities listed below in
Table 61.
Table 61 Identities for hyperbolic functions
sinh (x ± y) = sinh x cosh y ± cosh x sinh y; (6-29)
cosh (x ± y) = cosh x cosh j ± sinh x sinh ^ ; (6-30)
cosh 2 x — sinh 2 x = 1 ; (631)
tanh 2 x + sech 2 x = 1 ; (6- 32)
1 + cosech 2 x = coth 2 x. (6-33)
Table 6-2 Derivatives of hyperbolic functions
— (sinh x) = cosh x; (6-34)
ax
— (cosh x) = sinh x; (635)
dx
— (tanh x) = sech 2 x; (636)
dx
— (coth x) = — cosech 2 x; (6-37)
dx
— (cosech x) = — cosech x coth x; (6-38)
dx
— (sech x) = — sech x tanh x. (6-39)
dx
SEC 6-4 HYPERBOLIC FUNCTIONS / 289
Appeal to Definitions 6-3 and 6-4 together with the differentiability
properties of the exponential function establishes Table 6-2, the table of
derivatives.
The behaviour of the hyperbolic functions is indicated graphically in
Fig. 6-2 and for comparison the graphs of y = \z x and y = \t~ x have been
added to Fig. 6-2 (a).
Functions inverse to the hyperbolic sine and cosine are introduced
through the following definitions.
definition 6-5 The inverse hyperbolic sine, arcsinh x, and the inverse
hyperbolic cosine, arccosh x, are defined by the relationships:
(a) y = arcsinh x o x = sinh_y;
(b) y = arccosh x o x = cosh y.
Their derivatives are readily obtained by direct use of this definition and
we illustrate the process by deriving d/dx arcsinh x.
If y = arcsinh x, then x = sinh y and so, differentiating with respect to
x, we obtain
dy
1 = cosh y -f->
ax
and so
dj 1 1
dx cosh y \/(l + sinh 2 j)
by virtue of identity (6-31) and the fact that cosh y is essentially positive.
Hence, using the fact that x = sinh y, we find that
d , 1
— (arcsinh x) = — — for all x.
dx V(l + x 2 )
In the case of y = arccosh x we must proceed with more care.
If y = arccosh x, so that x = cosh y, then, as before, differentiating with
respect to x gives
dy
1 = sinh v . —
y dx
or,
dy 1
dx sinh j?
290 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
Fig. 6-2 Hyperbolic functions: (a) y = sinh x and y = coshx; (b) y = tanhx;
(c) y = coth x;
SEC 6-4
HYPERBOLIC FUNCTIONS / 291
y = cosech x
(d)
Fig. 6-2 (continued) (d) y = cosech x; (e) y — sech x.
Now from the graph in Fig. 6-2 (a) we see that sinh y is positive if its argument
arccosh x > and negative if arccosh x < 0. Thus two different inverse
functions must be defined.
If arccosh x > 0, then
Table 6-3 Derivatives of inverse hyperbolic functions
d_
dx
d_
dx
d_
dx
• all x:
I arcsinh - | = — , for
\ a J V(* 2 + a 2 )
(x\ 1 XX
arccosh - I = — , for arccosh - > and -
a J \(x 2 — a 2 ) a a
i x\ — 1 X X
I arccosh - ) = — , for arccosh - < and -
y a J v '(* — a') a a
> i;
> i;
dx\
I arctanh
x\ _ a
a J a 2 — x 2
I arccoth -
for x 2 < a 2 ;
for x 2 > a 2 ;
dx \ a I a* — x*
d / , A ~ a , „
— - I arccosech - I = , for all x;
dx \ a ! x\(x 2 + a 2 )
d / , x\ —a „ , x x
— I arcsech - I = — — , for arcsech - > and < - < 1 ;
dx \ a! xv (a *■ — x l ) a a-
d / x\ a x x
— I arcsech - = — — — , for arcsech - < and < - < 1 .
dx \ a I x\/(a^ — x l ) a a
(6-40)
(6-41)
(6-42)
(6-43)
(6-44)
(6-45)
(6-46)
(6-47)
292 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
dy 1 1 , 1
cLy sinh y V(cosh 2 y - 1) VCy 2 -
Conversely, if arccosh x < 0, then
dp 1 -1 -1
for a- > 1 .
for a- > 1 .
dx s'mhy \/(cosh 2 y — I) \/(a 2 — 1)
Other inverse hyperbolic functions are defined similarly and it is left to
the reader to verify the remaining entries in Table 6-3. (In many books
the inverse function is denoted by a superscript — 1 , when sinh -1 x is written
in place of arcsinh x, etc.)
The following examples are representative of the limiting and differenti-
ability problems encountered with hyperbolic functions.
Example 6-9
5 sinh 3a- + xe x
(a) Evaluate lim — ;
4e 3 *
(b) Find /'(a) if /(a) = sinh (a 2 + 3x + 1) 1/2 ;
(c) Find /'(a) given that /(a) < is given by /(a) = arccosh (sin 2 a);
(d) Determine f x andf y given that /(a, y) = xy cosh (a 2 + y 2 ).
Solution (a) From Definition 6-3 it is easily seen that for large x
sinh 3a = ^e 3x .
Hence, applying the usual arguments, it follows at once that
5 sinh 3a + Ae* . (5e 3 */2) + Ae* 5
hm = lim = —
4e3* ^ 4e3* 8
(b) /"(a) = [cosh (a 2 + 3a + 1) 1/2 ] • - • „ (2 * + 3)
w - / w L v ' J 2 (a 2 + 3a+ 1) 1/2
so that
fix) = , ^ X + 3) cosh (a 2 + 3a + 1) 1/2 .
J y ' 2(a 2 + 3a + 1) 1/2 v '
(c) Set y = arccosh (sin 2 a) so that
sin 2 x = cosh y.
Differentiation with respect to a then gives
dv
2 sin a . cos x = sinh y . —
dA
SEC 6-5 EXPONENTIAL FUNCTION WITH COMPLEX ARGUMENT / 293
or
dy 2 sin x . cos x
dx sinh y
As we are told that y —fix) < it then follows that
dy —2 sin x . cos x —2 sin x . cos x
dx = V(cosh 2 j> - 1) = V(sin 4 x- 1)
provided sin x ^ 1 .
8/\
(d) j- = j cosh (x 2 + j 2 ) + xj> d/dx cosh (x 2 + j 2 )
= y cosh (x 2 + y 2 ) + 2x l y sinh (x 2 + y 2 ).
Similarly,
8f
j- = x cosh (x 2 + j 2 ) + 2xj> 2 sinh (x 2 + j> 2 ).
6 - 5 Exponential function with a complex argument
If we formally replace x by ix in the series expansion of t x in Theorem 6-1
we obtain
x 2 x 3 x 4 x 5 x 6 x n
t ix = 1 + ix i 1 \- i 1- • • • + /» h • • •
2! 3! 4! 5! 6! n\
Clearly e fa is a complex number for any fixed real number x and, writing
it in the form e to = C(x) + iS(x), it follows by equating real and imaginary
parts that
v-2 x i y6 r 2»
and
X 3 X 5 X 7 , ^ x 2 " +1
*(,)-,-_ + ___ + ... + (_!). __ + ..,
Thus, in fact, if x is regarded as a variable, S(x) and C(x) are functions of
x and e te is, in some sense yet to be properly defined, a function of a complex
variable.
Assuming that the series for C(x) may be differentiated term by term it is
easily verified that
„„ s x 3 x 5 x 7 x 2n + l
C'(x) =-~xH H h- • •+ (-l) n+1 h • • •
W 3! 5! 7! T ^ V ; (2«+l)! +
294 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
»*
Next, differentiating C'(x) again with respect to x yields
jc 2 x 4 X s x 2 "
showing that in fact
C"(x) = -C(x).
Now, setting x = in the series for C(x) and C'(x), we find that
C(0) = 1 and C'(0) = 0.
Hence the function C(x) is seen to be the solution of the special differential
equation
with j(0) = 1 and /(0) = 0.
This same differential equation with the conditions on y was encountered
in Example 5-13 (a), where it was derived as the equation satisfied by y = cos x
and its derivatives. Thus the function C(x) is, in reality, the function cos x.
An analogous argument establishes that S(x) = sin x. On account of this
identification of C(x) and S(x) we may write
t ix = cos x + i sin x. (6-48)
As a direct consequence of replacing x by — x in Eqn (6-48) and using the
fact that cos x is even, but sin x is odd, we find that
e -te = cos x — i sin x. (6-49)
Combination of Eqns (6-48) and (6-49) leads to the following definitions of
the sine and cosine functions.
DEFINITION 66
sin x = and cos x =
2/ 2
Comparison of Eqns (415) and (6-48) shows that e ix represents a complex
number of unit modulus lying on the unit circle drawn about the origin.
The argument of e tx is x.
Slightly more general than Eqn (6-48) is the complex number e (x+i ^ for,
by the property of indices together with Eqn (6-48), we have
e cr+<irt = e * . c*» = e*(cos y + i sin y), (6-50)
showing that
| e's+M | = e* and arge te +w> = y. (6-51)
SEC 65 EXPONENTIAL FUNCTION WITH COMPLEX ARGUMENT / 295
».
Thus the modulus-argument form of a general non-zero complex number z
may be written
z = re* 9 ,
where
r = | z | and = arg z. (6-52)
This is, of course, an alternative form of Eqn (4- 1 5).
As it is true for any exponent a that (a x ) a = a xx , it follows that (e ix ) a =
e iax , so that from Eqn (6-48) we arrive at the result
(cos x + i sin x) a = cos a.x + i sin xx. (6-53)
This is simply de Moivre's theorem (Theorem 4-2) for any exponent a and
not just for the integral values used in the first proof of this important theorem.
To close, let us apply these results to give an alternative derivation of the
results of Example 4-10, and also to express sin™ and cos™ in terms of
sums involving sin rd and cos rd, as promised in that example. As in Chapter
4, the argument is best presented by example.
Example 6-10
(a) Express sin nd and cos nd in terms of cos and sin 0. Deduce the
form taken by the result when n = 4.
(b) Express cos 7 in terms of cos rd.
(c) Express sin 5 in terms of sin rd.
Solution
(a)
cos nd = Re(e toe ) = Re[(e iS ) B ] = Re[(cos d + i sin 0)»].
sin nd = Im(e te ") = Im[(e ie )»] = Im[(cos d + i sin 0)»].
When « = 4we have
(cos d + i sin 0) 4 = cos 4 8 + Ai cos 3 d sin d — 6 cos 2 d sin 2
— Ai cos sin 3 8 + sin 4 0.
Hence
cos Ad = Re[(cos -h / sin 0) 4 ] = cos 4 0-6 cos 2 0'sin 2 + sin 4
and
sin 40 = Im[(cos + i sin 0) 4 ] = 4(cos 3 sin - cos sin 3 0).
(b) From Definition 6-6 we may write
-i«\7
COS 7
_ /e« 9 + e-* e \ '
296 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
y the
Expanding the right-hand side by the Binomial theorem, simplifying and
grouping terms, we obtain
1 /e" 9 + e" w e 5 * 9 + e~ 5ie e 3 * 9 + e" 3 * 9
^ = 2^— + 1 ~T— + 2l —2
Again using Definition 6-6, we see that this immediately simplifies to
cos 7 6 = — (cos 70 + 7 cos 50 + 21 cos 30 + 35 cos 0).
(c) From Definition 6-6 we may write
Expanding the right-hand side, simplifying and grouping terms gives
1 /e^* 8 _ e-5i e e 3 ' 9 — e~ 3<9 e* 9 — e~ i9 \
sin5e = 2-<(-^ 5 — 2T- + l0 —2r-}
Again appealing to Definition 6-6, we see that this immediately reduces to
sin 5 6 = — (sin 50-5 sin 30 + 10 sin 6).
16
A variant of the method used here and in example (b) above is to be found
outlined in Problems 6-37 and 6-38.
PROBLEMS
Section 6-1
6-1 Solve the differential equation dy/dx = y, with y(0) = c, as in Section 61, by
substituting
CO
y = J a rX r .
Hence deduce that, provided c ^ 0, the differential equation has the non-
trivial solution y = ct x .
6-2 The function y = e~ x satisfies the differential equation dy/dx = —y, with
y(0) = 1 . Use the method of the previous problem to verify the series solution.
6-3 It follows from the argument preceding Eqn (6- 16) in Section 6T that
< S n - Sr <
x"
(R - D!
where the integer R > 2x. Use this result to deduce the least number of terms
that must be included in the series expansion of e 2 in order that the error
involved is less than 0-01.
PROBLEMS / 297
6-4 Evaluate the following limits:
4e 2j + xe x + 3
(a) lim
(b) lim
^ x 5xe 3x + c*+ 1
(x 2 + l)e 3 * + e z + 1,
(2x 2 - 3x + l)e 3 *
(2 - x 2 )f + 3 _
V^T+v + xy**'
,„ ,. 3(2 e~ 3 * + x 2 + 1 )
(d) ^o 4e* + 2*+l •
6-5 Make use of the series expansion of c x to evaluate the following limits and
verify your result by using Theorem 5- 14:
(a) lim — ;
1 — e~ x
(b) lim -^-r-;
3-_>o sin Ax
, , .. & -\-x
6-6 Differentiate the following functions:
(a)/(x) = 2e*cosx;
(b) /"(jc) = e 3 -* arcsin x;
(c)'/(x) = e*/x 2 ;
(d) /(x) = e* 8lM .
6-7 Differentiate the following functions:
(a) f{x) — arcsin e 2 *;
(b)/(*) = v '(*e* + x);
(c)/(x) = sinOe*+ 2);
(d) /(*) = (e* - l)/(e* + 1).
Section 6-2
6-8 Differentiate the following functions:
(a) /**) = 3 exp [-(** + *+ 1)];
(b) f(x) = e si " 2j: ;
(c) /(jc) = cos [exp (x sin x + 2)].
6-9 Find the second derivatives of the following functions:
(a)/(x) = e 3 * 2 ;
(b)/(x) = sin(l + e 2 *);
(c) f(x) = e sinr .
298 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
6- 10 Consider the function f(x) defined as follows:
(e- 1 ^ 2 for x # 0,
/to = ,
J 10 for x = 0.
Clearly the differentiability properties of this function at the origin must be
deduced directly from the definition of a derivative. To deduce these properties
show first that for x i= 0, it follows that
/'to = |e-^ 2 .
Then, by using Definition 5-2 together with Theorem 6-3, prove that/'(0) = 0,
and hence deduce that
\imf'(x) = f'(0) = 0.
x-*0'
Finally, deduce that in general,
/<">(*) = e~ 1/xi x (Polynomial in Ijx),
and hence by using an inductive argument prove that / <n, (0) = for all n.
This is an example of a function which is capable of differentiation an arbitrary
number of times for all x, and yet which has every derivative equal to zero at
one point of its domain of definition.
611 Find SfjBx and dfjdy, given that
f(x, y) = e sin (vlx \
6 12 Show that u == xy + xt ylx satisfies the equation
x-^+y— = xy + u.
ox y Sy y
6- 13 Find d//d/, given that
f(x, y) = e 2 * +7 *'
where x = cos t, y = sin t.
6 14 Find df/8u and Sfjdv if
x
f(x, y) = 2 arctan -
y
with x = u sin v and y = u cos v.
Section 6-3
6-15 Evaluate the following limits:
, , ,. (x- l)logx 2
(a) lifn v \ s ;
x— *cc X
(b) lim **££»■.
X-+K, Ax + 1
PROBLEMS / 299
(c) lim
z—0
log (3 sin x) — log [(1 + x) sin x]
It* - 1
(d) lim [log (3* + 1) - log (2x + 5)];
(e) lim
J-*CO
log (1 + 2e*)
616 Let/(x) and g{x) be functions such tha,t lim/(x) = and lim^(x) = but
x— >a x-*a
lim 44 = *■■ Then
lim lo § H + y (x)] = ]im log [1 + /(x)] 1 ''/**) = lim log {[1 + /(x)]i//(*>}/MArt*>.
However, it follows from Chapter 3, Section 3 that lim [1 + /(jc)] 1 '-^ e* e,
so that
, im ] °g [1 +/ (X)] = «m log e™> = Hnn ^ = *•
Apply this result to evaluate the following limits :
log(l +lx) _
2x
log (1 + 3 sinx)
(a) , im !5S0±H).
x^o 2x
(b) lim
x-*0
(c) lim
*—
log[l - 2sin 2 (x/2)],
use your result to deduce lim (cos x) 1,x .
a-—0
617 Apply Theorem 514 to evaluate the limits in Problem 6-16.
618 Differentiate the following functions:
(a)/(x) = logO 3 + 7x 2 + 2);
(b) f(x) = log sin 2x;
I x — V
(c) f{x) = log cos
619 If v = [f(x)]o^ then, taking the natural logarithm,
log y = £-(x)log/(x).
Hence, differentiating with respect to x, it follows that
dy
dx
^x)log/(x)+<g/'(x)
[/(x))^>
Use this result to differentiate the following functions ;
(a) y = x x ;
(b) y = (sin2x)*;
(c) v = x* inx ;
(d) 'y = 10 lo 8 sin *.
300 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6
6-20 If u = x log (1 + xjy) + y log (1 + yjx)
a 82 " o^ 2 "
show that x 1 -— = y 2 — .
8x 2 J dy 2
6-21 Find the total derivative dz given that
z = log (x 2 + 2y 2 ).
6-22 Show that the function
/(*> y) = arctan y/x + log (x 2 + y 2 )
satisfies the equation
dx 2 dy 2
6-23 By taking logarithms deduce du/dx, dujdy, and dujdz if u = (xy) 2 .
Section 6-4
6-24 Use the definitions to establish the form taken by :
(a) sinh x;
(b) cosh x;
(c) tanh x;
when x is large. Distinguish between x large and positive and x large and
negative.
6-25 Prove by means of the definition that
(cosh x + sinh x) n = cosh nx + sinh nx.
6a26 Use the definitions to verify any three of the identities contained in Table 6-1.
6-27 Prove by means of the definitions that :
(a) 2 sinh x cosh y = sinh {x + y) + sinh (x — y);
(b) 2 cosh x cosh y = cosh (x + y) + cosh (x — y);
(c) 2 sinh x sinh y = cosh (x + y) — cosh (X — y).
6-28 Verify any three of the entries in Table 6-2.
«i/29 Verify the derivatives of arccosech x/a and arcsech x\a given in Table 6-3.
6-30 Evaluate the following limits, using the series (6-27) and (6-28) where necessary:
x 3 cosh 2x + e x
(a) lim
X— »-oo
(b) lim
X— *■ —
(c) lim
x->0
(d) lim
(2x 3 + x + l)e 2 * + * 3 e- 2 *'
x 3 cosh 2x + e*
,o (2x 3 + ^ + l)e 2 * + x 3 e- 2 *'
sinh ax
X-+0 x
1 — cosh 2x
x^o 3x 2
6-31 Differentiate the following functions:
(a) f(x) = sinh 2x cosh 2 x;
(b) f(x) = exp (1 + cosh 3x);
PROBLEMS / 301
(c)/X*)=«log(tanhx);
(d) f{x) = arcsech (x 2 + J) if /(*) > 0;
(e) f{x) = cosh (sin 2x).
6-32 Evaluate dujdx and Suj8y given that :
(a) uix, y) = sin x cosh xy\
(b) «(x, j) = sinh (x 2 + x sin y + 3y 2 ) ;
(c) u(x,y) = xcosh^+2^
Section 6-5
6-33 Establish by means of the definitions that:
(a) sin (Jz) = i sinh z;
(b) cos (;'z) = coshz;
(c) sinh(z'z) = /sin z;
(d) cosh (iz) = cos z.
6-34 Given that a, b are positive real numbers, deduce four trigonometric identities
by equating real and imaginary parts in each of the following results
Qia gib = Qi(a+b) and e* a . e - *^ = £*(«-&),
6-35 Express the following complex numbers in the form re i6 :
(a) 1 + i; (b) 1 - /; (c) -80V3 - 1);
(d) (-1+/) 8 ; (e) (5+ 140/(4 + /).
6-36 Show by means of de Moivre's theorem that :
(a) 32 cos fr 6 = 10 + 15 cos 26 + 6 cos 46 + cos 66;
(b) sin 70 = 7 sin 6 - 56 sin 3 6 + 112 sin 5 6-64 sin 7 6.
cos 6 = - j z H — | and sin 6
2\ z)
6-37 Verify that if z = e ie , then
~*H)
and, more generally,
cos r6 = - J z r + — J and sin rd = — - | z r J •
By replacing cos 6 and sin 6 by their equivalent expressions involving z,
make use of these results to express cos 2 6 sin 3 6 in terms of sin n(t.
6-38 Use the method of Problem 637 to express sin 8 in terms of cos nd.
6-39 Consider the function cosh z, where z = x + iy. Then, using Definition 6-3,
deduce that coshz =a when z = (2« + l)«'/2, with n = 0, ±1, ±2, . . ..
Use the results of Problem 6-33 to deduce the zeros of cos z.
6-40 Consider the function sin z, where z = x + iy. Then, using Definition 6-6,
deduce that sin z = when z = w, with n = 0, ±1, ±2 Use the
results of Problem 6»33 to deduce the zeros of sinh z.
Fundamentals of
integration
7-1 Definite integrals and areas
The work of this chapter is concerned with the theory of the operation known
as integration, which occupies a central position in the calculus. The connec-
tion between differentiation and integration is basic to the whole of the
calculus and is contained in a result we shall prove later known as the funda-
mental theorem of calculus. Once again, limiting operations will play an
essential part in the development of our argument. In fact we will show not
only how they enable a satisfactory general theory of integration to be
established, but also how they provide a tool, albeit a clumsy one, for the
actual integration of functions. However, aside from a number of simple but
important examples, the practical details of the evaluation of integrals of
specific classes of function will be deferred until Chapter 8.
We begin by seeking to determine the shaded area / of Fig. 71 which is
interior to the region bounded above and below by the curve y = f(x) and
the x-axis, respectively, and to the left and right by the lines x = a, x = b.
This approach will lead naturally to what is called the definite integral of
f{x) over the interval a < x < Z>, and it illustrates a valuable geometrical
interpretation of the process of integration. Although we use the definite
integral to give precise meaning to the notion of the area contained within a
closed curve, this appeal to geometry is not actually necessary when defining
an integral. Indeed, we shall also show how a purely analytical definition of
Fig. 71 Area / defined by y = f(x).
SEC 7-1 DEFINITE INTEGRALS AND AREAS / 303
a definite integral, quite independent of any geometrical arguments, may be
formulated.
Let f(x) be a non-negative continuous function defined in the closed
interval [a, b] and consider, for a moment, the conceptual problem that arises
when trying to determine the area /defined by it in Fig. 71. The only simple
plane geometrical figure for which the concept of area is defined in an ele-
mentary and unambiguous manner is the rectangle, so that we shall seek to
define the area / in terms of the limit of a sum of rectangular areas. It should
perhaps be remarked at this point that the derivation of the formula nr 2 for
the area of a circle of radius r involves the concept of integration, although
this is invariably avoided in any first encounter by the employment of
arguments that are at best only plausible.
We shall start our discussion from the postulates that (a) the area of a
rectangle is given by the product length X breadth, (b) the area of the union
of two non-overlapping rectangles is the sum of their separate areas, and
(c) if a rectangle is divided into two parts by a curve, then the sum of the
separate non-rectangular areas comprising these two parts is equal to the
area of the rectangle.
On the basis of postulate (c), we at once see that the area / in Fig. 7-1
exceeds the rectangular area ABEF, but is less than the rectangular area
ACDF. Letting m, M denote, respectively, the minimum and maximum
values attained byf(x) in [a, b], this result becomes
m(b - a) < / < M(b - a). (7-1)
This inequality, although interesting, must obviously be refined if it is
ever to lead to the actual value of/. In principle, our approach will be simple,
for we shall begin by dividing [a, b] into n adjacent sub-intervals in each of
which an inequality of type (7T) will apply, after which we shall use postulate
(b) to find better upper and lower bounds for /.
Specifically, we start by choosing any sequence of n + 1 numbers xo,
xi, . . ., x n subject only to the requirements that .yo = a, x n = b, and
Xo < XI < • • • < X n -1 < X n .
The sequence {x r }" r =o so defined is called a partition P of the interval [a, b],
and for any given value of n it is obviously not unique. Next, on each sub-
interval [xi-u xt], let the function f(x) attain a minimum value mi and a
maximum value Mi and denote the length of the /th sub-interval by A{, so
that
Aj = x t — Xi-i.
We now define numbers Sp and Sp called, respectively, the lower and upper
sums taken over the partition P, by the expressions
n
Sp = miAi + w 2 A 2 + • • • + m n A» = 2 w rA r (7-2)
304 / FUNDAMENTALS OF INTEGRATION
CH 7
and
S P = MiAx + M 2 A 2 + • • • + M„A B = 2 M^ r .
(7-3)
Clearly, as Figs. 7-2 (a), (b) illustrate, Sp and Sp are, respectively, under- and
over-estimates of the area /.
The fact that Sp < S P is apparent on geometrical grounds, but it also
follows without appeal to geometry by considering the difference
Sp — Sp — (Mi — wi)Ai + (M 2 — w 2 )A 2 +
+ (M„ — m„)A. n .
(7-4)
Fig. 7-2 (a) Shaded area represents lower sum S p ; (b) shaded area represents
upper sum S p .
In this equation we have, by definition, A r > and M r > m r for r = 1,
2, . . ., n, so that
Sp- Sp>0 or, 5p < Sp,
and thus by postulate (c),
Sp<I<Sp. (7-5)
It would seem reasonable to suppose that as the number n of points in a
partition increases, provided the lengths of all intervals shrink to zero, the
limit of both the lower and upper sums must be /, the desired area. We prove
this in two stages, first considering the effect on the lower and upper sums of
the refinement of the partition P by the inclusion of extra points.
It will suffice here to consider only the effect of the inclusion of one extra
point Xr between x r -i and x r in the partition P. The resulting partition P' is
called a refinement ofP, in the sense that although P' has more points than P,
all points of P are also points of P'.
Suppose that in the intervals [x r -i, x r '] and [x/, ay] the function f(x)
attains the minimum values m r ' and m r ", respectively. Then the effect of the
SEC 7-1
DEFINITE INTEGRALS AND AREAS / 305
extra point is to replace the term m r {x r — x r -i) in the lower sum Sp by the
sum m r '(x r ' — x r -i) + m r "(x r — x r ') thereby generating the sum S P ' appro-
priate to the refinement P' of the partition P. As it must be true that m r < m r '
and m r < m r ", it thus follows that
m r \Xr — Xr-l) + m r "{x r - X r ') > m r (x r - Xr-l),
whence
Sp < Sp'. (7-6)
Identical reasoning involving the maxima M/ and M r " attained by f(x) in
the intervals [x r -i, x r '] and [x r ', x r ] establishes that
(7-7)
Mr' = Mr
mr = Mr"
m T = m r
(b)
Fig. 7-3 Effect of refinement of a partition: (a) area inequality on interval
[xr-i, x r ] ofP; (b) area inequality on interval [x r -u x r ] o(P'.
The inequalities leading to results (7-6) and (7-7) are illustrated geometric-
ally in Figs. 7-3. Thus in Fig. 7-3 (a) the area inequalities associated with the
interval [x r -i, x r ] of P are displayed, whilst in Fig. 7-3 (b) the corresponding
situation is displayed for the refinement P' produced by inserting an addi-
tional point x r ' in [x r ~i, x r ].
The further refinement of the partition P' by the inclusion of additional
points only serves to reinforce results (7-6) and (7-7). We have thus estab-
lished that if the partitions Pi, Pz, . . ., P m are successive refinements of the
partition P, then
m(b — < a) < S Pl < Sp 2 <
< S Pm < / < S Pm < S Pm . <
• • • < S Pl < M(b - a). (7-8)
Expressed in words, the effect of refinement of a partition is to increase the
corresponding lower sum and to decrease the corresponding upper sum, so
that {S Pr } is a monotonic increasing sequence of numbers, and {S Pr } is a
monotonic decreasing sequence of numbers.
For the second and final stage of our argument we introduce the norm
|| A | \p of a partition P by means of the definition
306 / FUNDAMENTALS OF INTEGRATION CH 7
] | A | \ P = max (x t - x<_i). (7-9)
i
That is to say, for any partition P of the interval [a, b], the norm ] ] A ] \ P is
the length of the longest sub-interval of [a, b] produced by the partition.
Let us consider a sequence of partitions which are successive refinements
of P and are such that
lim II A lip =0.
m—*oa
Then by the postulate of Section 3-2, as {S Pr } is monotonic increasing and
bounded above it must tend to a limit S and, similarly, as {S P } is mono-
tonic decreasing and bounded below it must tend to a limit S, where
S<I<5. (7-10)
To show that 5 = S, as would be expected, observe that if
dp = max (Mi — m t ) for all /',
i
then Eqn (7-4) gives rise to the inequality
5p - S P < <5 P (Ai + A 2 + • • • + A») = d P {b - a). (7-1 1)
Hence, for any sequence of partitions Pi, P%, . . ., P m , . . . which are refine-
ments of P with the property that lim 1 1 A | \ Pm -»- 0, it follows from the
continuity of f(x) that lim d Pm -»• 0, thereby showing that {S Pm — S P } is a
null sequence. Thus {S Pm } and {S Pm } both have the same limit.
Taken in conjunction with Eqn (7-10), we have proved that because of the
continuity of/(x), the limit of the lower sums is equal to the limit of the upper
sums, and each is equal to the limit / which has been interpreted as the
shaded area in Fig. 7T.
The limiting argument used above certainly suffices to define the area /,
but before formulating our definition of the definite integral, let us first make
a useful generalization of our argument. With the partition P used earlier
associate any set of n numbers fi, h, ...,!» for which it is true that
*o < f 1 < Xi, xi <_f 2 < xt, . . ., x n -i < i n < x n .
Now form the approximating sum S P defined by
Sp =/(^i)A 1 -r-/(! 2 )A 2 + • • • +/(|„)A n . (7-12)
Then because mi </(&) < Mi, it follows at once that
S Pm <S Pm <S Pm , (7-13)
for all refinements Pi, i>2, . . ., Pm, . . . of the partition P. Consequently,
since lim S Pm = lim S Pm = /, it follows immediately from Theorem 3-6 that
lim S P = /.
This important result asserts that if f(x) is continuous on [a, b], then as
SEC 7-1 DEFINITE INTEGRALS AND AREAS / 307
the partition is refined, so the corresponding upper and lower sums S Pm ,
Sp and the approximating sum S Pm all converge to the same limit. We now
state this as our first fundamental theorem which forms the basis of our
development of the integral.
theorem 7-1 (first limit theorem for sums defined on a partition) Let
f(x) be a continuous non-negative function on the closed interval a < x < b,
and let Pi, P2, . • -,Pm,- ■ . be a sequence of successive refinements of some
partition P of [a, b] with the property that lim 1 1 A 1 1 Pm = 0. Then, if & is any
point in the /th sub-interval of length A* generated by the partition P m , and
S P and S P are respectively the lower and upper sums associated with P m ,
it follows that
n
lim S P = lim S Pm = lim T /(£ i)A«.
— m m 1 1 A I 1 n ■ t
m-»oo m-»oo ||A||p m -*0 t = l
This theorem suggests the following form of definition for the definite
integral.
definition 7-1 (definite integral of a continuous non-negative function)
Let f(x) be a continuous non-negative function on the closed interval
a < x < b, and let Pi, P2, , . ., Pm, ■ ■ ■ be a sequence of successive refine-
ments of some partition P of [a, b] with the property that lim 1 1 A | \P m = 0.
Then, if |« is any point in the rth sub-interval of length A< generated by the
partition P m , the definite integral of f(x) integrated over the interval [a, b],
and written symbolically
rb f(x)dx,
i
Ja
is defined to be
Cb
Ja
f(x)dx = lim 2 /(&)A«.
l|A||P m -0 i = l
In the context of a definite integral, the function f(x) is called the inte-
grand, the numbers a, b are called the lower and upper limits of integration,
respectively, and the sign J" itself is called the integral sign.
In summary then, a definite integral of a positive continuous function
f(x) integrated over the interval [a, b] is a positive number defined by means
of a limiting process. It may be interpreted geometrically as the shaded area
/ below the curve y = f{x) as shown in Fig. 7-1.
To show that this is a Working definition, in the sense that it can be used
to yield a useful answer, let us now apply it to a simple function.
Example 71 Evaluate the definite integral
x 2 dx, where a < b.
J a
308 / FUNDAMENTALS OF INTEGRATION CH 7
Solution As x 2 is everywhere continuous and is non-negative on the stated
interval Definition 7-1 applies. Thus we start by considering a convenient
partition P n in which [a, b] is divided into n equal sub-intervals, each of
length A = (b — d)\n. Then, if for convenience we identify f t with the right-
hand end-point of the rth sub-interval, we have
f i = a + A, | 2 = a + 2A, h = a + 3A, . . ., &, = a + «A.
Hence, from Definition 7-1,
/ = lim J (a + (A) 2 A.
n— »-oo i = l
Expanding and grouping the terms of the summation then gives
/ = lim [wa 2 A + 2aA 2 (l + 2 + 3 + • • • + ri)
+ A3(l 2 + 22 + 32 + ■ • • + «2)].
Using the fact that A = (b — d)\n together with the well-known results
1+2 + 3 + - • • + it = 2 („ + i)
and
12 + 22 + 3 2 + ... + „2 = ^+i)^L±l),
6
it follows that
/ = lim [a\b -a) + a{b - a) 2 K w + *) |
Thus, taking the limit, we find
/ = K* 3 - « 3 ),
and so
_l ri. vi v- + 1X2" + 1)
f
x 2 dx = K& 3 - a 3 )-
In terms of numbers, if a = I, b = 2, then
I
2 * 2 dx = K2 3 - l 3 ) = -■
1 j
When the behaviour of f(x) is monotonic over the interval a<x<b,
then Theorem 7-1 coupled with Definition 7-1 can often be used to derive
interesting and useful series approximations to the definite integral as the
following example illustrates.
SEC 7.1 DEFINITE INTEGRALS AND AREAS / 309
Example 7-2 Show that
» / 1 \ r 2 dx n ( l
x r =i \n + r
Solution In this case f(x) = 1/x, which is continuous, positive, and mono-
tonic decreasing on the interval [1, 2] so that Theorem 7-1 and Definition 7-1
apply. We again choose a partition P n which divides the interval [1, 2] into
n equal sub-intervals of length A = 1/n. The general point x r in the partition
P n is, of course, x r = 1 + rjn so that
n + r
Thus as/(x) is monotonic decreasing, it follows that on the interval [x r -i, x r ],
f(x) attains its maximum value M r at x r -i and its minimum value m r at x r ,
where
n , »
M r = r and m r =
« + r — 1 « + r
Hence
5, -if-^V and S pBa if— 2_U
~ Pn r =i \n + r) n Pn r -i \n + r - 1/ n
so that from Theorem 7-1 and Definition 7-1, we deduce that
if 1 )>r*>i(—
r ± x \n + r - 1/ ~ Ji x r =i \n + r
A few numbers might help here, so we show in the table below the be-
haviour of the upper and lower sums S Pn and 5" Pn as a function of n.
n
Sp n
Sp„
5
0-7456
0-6456
10
0-7188
0-6688
15
0-7101
0-6768
00
0-6931
0-6931
We shall discover later that the exact result, which is shown in this table
against the entry n = oo, is in fact log e 2.
Before closing this section let us give brief consideration to the effect on
Theorem 7T of removing the condition of continuity imposed on the function
f{x) and substituting instead the condition that/(x) is bounded. The argu-
310 / FUNDAMENTALS OF INTEGRATION CH 7
ment leading to Theorem 7-1 proceeds as before until the stage at which S P
and S Pm are defined. Then, without the continuity of f(x) to ensure thai
| M r — m r | -> as \ Xr — x r -i | -> 0, it is no longer possible to infer that
when lim S Pm and lim S Pm exist, they are necessarily equal. However, if they
do exist and are equal, it follows as before that lim S Pm also converges to the
same limit. Thus we arrive at the following more general form of Theorem
7-1.
theorem 7-2 (second limit theorem for sums defined on a partition) Let
f(x) be a non-negative bounded function defined on the closed interval [a, b],
and let P\, P 2 , . . ., P m , . . . be a sequence of successive refinements of some
partition P of a < x < b with the property that lim 1 1 A | | Pm = 0. Then, if
f < is any point in the z'th sub-interval of length A< generated by the partition
P m , and S Pm and S Pm are respectively the lower and upper sums associated
with P m , it follows that if
lim S P = lim S P = /,
it must also be true that
/ = lim 2 /(f«)A,.
II A||P m -*0 » = 1
The corresponding modification of Definition 7-1 is given below for
reference and, because this form of definition was first given by B. Riemann
(1826-66), the definite integral is known formally as the Riemann integral.
Usually only the term definite integral will be employed.
definition 7-2 (Riemann integral of a non-negative function) Let/(x)
be a non-negative bounded function on the closed interval a < x < b, and
let Pi, Pi, ■ . ., P m , ... be a sequence of successive refinements of some
partition P of [a, b] with the property that lim || A \\ Pm = 0. Furthermore,
let fj be any point in the rth sub-interval of length A* generated by the
partition P m , and let S Pm and S Pm be, respectively, the lower and upper sums
associated with P m .
Then, if
lim S P = lim S P ,
the Riemann integral off(x) integrated over the interval [a, b], and written
symbolically
f
b /Mdx,
is defined to be
rb f(x)dx= lim i/(IOA*.
l|A||p m -0 i=l
Ja
SEC 7-2 INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 311
To show that not all bounded functions are Riemann integrable it is only
necessary to consider the integral over the interval < x < 1 of the function
fM = (J
1 for x rational
for x irrational.
Then clearly /(x) is non-negative and bounded on [0, 1], but by a suitable
choice of the numbers & in the approximating sum of Definition 7-2, the
limit of the sum may be made to assume any value between zero and unity.
This situation arises because the limits of the upper and lower sums are not
the same. In more advanced accounts these difficulties are overcome by
defining a more general form of integral known as the Lebesgue integral.
7-2 Integration of arbitrary continuous functions
As most functions assume both positive and negative values in their domain
of definition, our notion of a definite integral as formulated so far is rather
restrictive, for it requires that the integrand be non-negative. A brief examina-
tion of the introductory arguments used in the previous section shows that
this restriction stems from our idea of area as being an essentially positive
quantity, although this was not stated explicitly at any stage in our argument.
Nothing in the limiting arguments that we used requires either the upper
and lower sums themselves, or any of the terms comprising them to be non-
negative. Since a term in either of these sums will be negative when m r or
M r is negative, that is, when f(x) is negative, it follows that the inter-
pretation of a definite integral as an area may be extended to continuous
functions /(x) which assume negative values provided that areas below the
x-axis are regarded as negative. This is illustrated in Fig. 7-4 in which the
positive and negative area contributions to the definite integral of f(x)
integrated over the interval [a, b] are marked accordingly.
Thus using this convention when interpreting a definite integral as an
area, we may remove the condition that the integrand /(x) be non-negative
throughout all of Section 7- 1 . Because it simply amounts to the deletion of the
word 'non-negative', we shall not trouble to reformulate our earlier definitions
and theorems to take account of this result. It is interesting to observe that
had we introduced the definite integral via the upper and lower sums, without
any appeal to graphs and areas, this artificial restriction would never have
arisen.
The definition of a definite integral of a function /(x) integrated over the
interval [a, b] immediately implies a number of important general results
which we now state in the form of a theorem. No proofs will be offered since
the results are virtually self-evident.
theorem 7-3 (properties of definite integrals) Let/(x), g(x) be continuous
functions defined on the closed interval a < x < b, and let c be a constant
and k be such that a < k <b. Then
312 / FUNDAMENTALS OF INTEGRATION
CH 7
(b) c/(x)dx = c f(x)dx (Homogeneity),
Ja Ja
J'b /*6 /*6
(/(*) + g(x))dx = /(x)dx + g(x)dx (Linearity).
a Ja Ja
Fig. 7-4 Positive and negative areas defined by y = f(x).
By virtue of these results, the definite integral of the function /(x) appro-
priate to Fig. 7-4 could, if desired, be written in terms of the sum of three
integrals involving non-negative integrands. To achieve this, notice that/(x)
is negative for k\ < x < k%, so that for all x in this interval, —f(x) is positive.
Then, first expressing our integral as the sum of three separate integrals over
adjacent intervals
f(x)dx = f(x)dx + f(x)dx + f(x)dx,
Ja Ja Jki Jk2
(7-14)
we can replace — f(x) by | f(x) \ in the second of these integrals to obtain
fb /-A-i
f( X )dx = /{xyh
Jp Ja
° K2 \f(x) | dx + \"f(x)dx.
ki Jk2
(7-15)
Each of these integrands is now the definite integral of a non-negative
function as required.
We must now take account of the fact that so far it has been implicit in
our definition of a definite integral that x increases positively from a to b,
where b > a. This sense, or direction, of integration is indicated in the definite
integral by writing a at the bottom of the integral sign J to signify the lower
limit of integration and by writing b at the top to signify the upper limit of
integration. If, despite the fact that b > a, their positions as upper and lower
limits of integration are reversed, this implies that integration is to be carried
out in the direction in which x increases negatively. Because we are now
allowing areas to have both magnitude and sign, to be consistent we must
compensate for a reversal of the limits of integration by changing the sign of
SEC 7-2 INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 313
the integral. Hence we arrive at our next definition.
definition 7-3 (reversal of limits of integration) If a < b, then we
define the definite integral
[ /(x)dX
of a continuous function /(x) by the equation
P/(x)dx = - f /(x)dx.
Jb Ja
Example 7-3 Evaluate the definite integral
•i
1
2x 2 Ax.
3
Solution From Definition 7-3 we have
2x 2 dx = — 2x 2 dx.
Hence an application of Theorem 7-3 (b) together with the result of Example
7-1 shows that
f 2x 2 dx = -2 ( x 2 dx = -2(J)(33 - !») = -
52
y
Since a definite integral is simply a number, the choice of symbol used to
denote the argument of the function/forming the integrand is arbitrary, and
often it is convenient to replace x by some other variable, say t. Thus
\ b f(x)dx and ff(t)dt
Ja Ja
are identical in meaning, so that
Cf(x)dx = Cf(t)dt. (7-16)
Ja Ja
On account of this fact, the variable in the integrand of a definite integral
is often called a dummy variable, and it is sometimes said to be 'integrated
out' when the integral is evaluated. This fact is usually recognized in modern
accounts of the theory of the definite integral by simply writing
f
I
Ja
in place of either of the expressions in Eqn (7T6). The full significance of the
symbol dx, which is suggestive of a differential, comes when changes of
314 / FUNDAMENTALS OF INTEGRATION
CH 7
Fig. 7-5 (a) Area / bounded by curves y = fix) and y = g(x); (b) area below
y =f(x); (c) positive and negative areas defined by y = g(x).
variable of the form x = g(u) are made in Eqn (7-16) and it is for this reason
that we choose to retain it. This matter will be taken up in detail in the next
chapter, where it is shown that because of the chain rule for differentiation,
dx can indeed be interpreted as a differential.
Now that the definite integral has been extended to arbitrary continuous
integrands we are in a position to determine quite general areas. Consider,
for example, the situation illustrated in Fig. 7-5 (a) in which it is desired to
determine the area / of the shaded region. Then obviously, referring to
Figs. 7-5 (b), (c) we have
/ = h + J 2 _ l 3 + / 4 ,
where h to h represent the positive areas identified by these symbols.
However, we know that
h
= f(x)dx,
Ja
and from the form of argument leading to Eqn (7- 1 5) we also know that
—h = \ g{x)dx, h = g(x)dx, -h = g(x)dx,
where ki and fa are the first and second points of intersection of y = g(x)
with the x-axis as x increases from a to b.
However, by Theorem 7-3 (a) we have
—h + h — h = g{x)dx,
Ja
so that combining these results we obtain
SEC 7-2
INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 315
y =/(*)
■HI
o
Fig. 7-6 Piecewise continuous function y = f(x) defining a sequence of areas
A,/2, . . .,/„-!.
/ = Cf(x)dx - ( b g(x)dx.
Ja Ja
From Theorem 7-3 (b) it then finally follows that
I = f\f(x) - g(x))dx.
Ja
(7-17)
Example 7-4 Find the area / between the two curves y = t 2x and y = —x 2 ,
which is bounded to the left by the line .v = 1 and to the right by the line
x=3.
Solution We start by making the obvious identifications f(x) = e 2x ,
g(x) = —x 2 , a = 1 and b = 3. Then from Eqn (717) it follows that
-r
(e 2 * + x 2 )d.v
whence, using the results of Example 7-1 and Problem 7-3, we find
26
/ = Ke 6 - e*) + y •
The fact that a definite integral is additive with respect to its interval of
integration enables a function to be integrated even when it has discontinu-
ities, provided only that they are finite in number and that elsewhere the
function is continuous and bounded. This result is perhaps best seen dia-
grammatically, though an analytical justification can easily be given without
appeal to geometry. By way of example, consider the function y =f(x)
illustrated in Fig. 7-6 which is bounded and continuous everywhere except at
the discrete number of points r\\, r]2, . . ., r\ n - Such a function is said to be
piecewise continuous, for obvious reasons.
Using the valid interpretation of a definite integral in terms of area we see
316 / FUNDAMENTALS OF INTEGRATION CH 7
that the total shaded area / is the sum of the sequence of areas h, h, . . .,
I n +u so that we may still write
I=\ f(x)dx, (7-18)
Ja
but this time with the understanding that
J'b r n - /*ij2- rb
f(x)dx = f{x)dx + f(x)dx + • • • + /(.v)d.v. (7-19)
o •>" J ni+ Jn,i +
Here, as before, we have used r^— to signify the limiting process of
approaching the point x = r\t from the left, and r] t + to signify the limiting
process of approaching the point x = r\i from the right.
Example 7-5 Evaluate the definite integral
/=£/(x)dx
when
_ lx 2 for < .v < 1
/ W - | e5 z for J < x < 2 .
Solution From Eqn (7-19) we have
/ = x 2 dx + \ e 5 * dx,
Jo Ji +
so that evaluating the integrals and then taking the appropriate limits gives
7 = 1 + l( e io _ e »).
Sometimes a more difficult situation than this arises in which either the
integrand tends to infinity at some point in the interval of integration or,
perhaps, the interval of integration itself is infinite in length. Such definite
integrals are called improper integrals, and the way in which to attribute a
value to any such integral is suggested by Eqn (7-19).
Let us illustrate something of the difficulty that can arise if ideas are not
made precise. Consider the integral
/:
dx
Then since;' = l/.v 2 is essentially positive, the area under the curve must
also be positive. Now if we apply the result of Problem 7-5 we have
I
1 dx_ _]_ J_ _ _
x 2 1—1
SEC 7-2 INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 317
which, since it is negative, contradicts our previous conclusion. What has
gone wrong? The trouble is that \jx 2 tends to infinity as x -> 0, so that the
arguments of Problem 7-5 are not applicable, for it was pre-supposed there
that the interval of integration excluded the origin. When dealing with
improper integrals of this type in which the integrand has an infinity within
the interval of integration we shall assign a value to the integral according to
the following definition.
definition 7-4 (improper integral due to infinity of integrand) Let the
function f(x) be continuous throughout the intervals a <Z x < c and c < x
< b, and suppose that f(x) has a singularity at x = c in the sense that
f{x) tends to infinity as x -> c. Then the integral of f(x) over the interval of
integration [a, b] is said to be improper, and it is defined to have the value
f(x)dx + lim f(x)dx,
a (5--0 Jc + d
whenever both limits involved exist. Under these circumstances the improper
integral will be said to converge to the value /. When either of the limits does
not exist, the integral will be said to be divergent. If the point c coincides with
an end-point of the interval [a, b], then / is defined to be equal to the limit of
the single integral for which the interval of integration lies within [a, b].
On the basis of this definition we are now able to determine the value to
be attributed to the improper integral used as an illustration above. Let us do
this in the form of an example.
Example 7-6 Evaluate the improper integrals :
Solution The integrand Ijx 2 tends to infinity as x — *■ 0, so that for case (a),
when appealing to Definition 7-4, we need to make the identifications
a = — 1, b = \, c = and/(x) = \jx 2 . Thus,
,• t~° dx .• f ld *
h = hm — + hm — •
^o J-i x 2 g-o Js x 2
Using the result of Problem 7-5 we find that
/i = lim(-- 1 ) +lim(-l + t ) -* oo.
e ^o \e / <5-0 V <v
Thus the improper integral (a) is divergent.
In case (b) the integrand is (x 2 + l)/x 2 , which again tends to infinity as
x—*-0. However, in this case we must make the identifications a = — 1,
b = 0, c = 0, and/(x) = 1 + l/x 2 , so that this time the singularity in the
integrand occurs at the right-hand end-point of the interval of integration
318 / FUNDAMENTALS OF INTEGRATION CH 7
[—1, 0] (that is, at the upper limit of integration).
It then follows from Definition 7-4 that
which, from the results of Problems 7-2 (b) and 7-5, becomes
<--M
7 2 = lim (-£ + + I-- M/ — °°-
Hence the improper integral (b) is also divergent.
The one remaining form of improper integral requiring consideration
occurs when the interval of integration is infinite. In these circumstances we
shall assign a value to the integral according to the following definition.
definition 7-5 (improper integral due to infinite interval of integration)
Let the function fix) be continuous on the interval [a, co), then the integral
of/(x) over the interval of integration [a, oo) is said to be improper, and it is
defined to have the value
f*
h = lim f(x)dx,
k->osJa
whenever this limit exists. Under these circumstances the improper integral
will be said to converge to the value h. When the limit does not exist, the
integral will be said to be divergent. Similarly, if the interval of integration is
(— oo, a], then when the limit exists, the improper integral of f(x) over the
interval of integration (— oo, b] is defined to have the value
h = lim f(x)dx.
fc->oo J — k
Symbolically, these improper integrals will be denoted, respectively, by
I x = P/Wdx and h = \ f(x)dx.
Ja J -co
Example 7-7 Evaluate the improper integral
f 00 dx
Solution It follows at once from Definition 7-5 that
. C k dx
/ = l.m -,
it-* oo J3 X"
so that by virtue of the result of Problem 7-5,
SEC 7-3 INTEGRAL INEQUALITIES / 319
fc^oo \_k 3 J 3
Hence this improper integral converges to the value 1/3.
7-3 Integral inequalities
A number of useful inequalities may be deduced concerning definite integrals,
the simplest of which has already been stated inEqn(7T). Let us now derive
our first result of this type, of which Eqn (7-1) represents a special case.
Suppose that the definite integrals of/(x) and g(x) taken over the interval'
[a, b] both exist. In brief, let us agree to say that/(X) and g(x) are integrable
over the interval [a, b]. Now suppose that/(x) < g(x) for a < x < b. Then
if P m is a partition of [a, b], we have from Theorem 7-2 that
Cg(x)dx- Cf(x)dx = f (g(x)-f(x))dx
Ja Ja Ja
n
.= Km 2(s(f<)-/(&))Af, (7-20)
l|A|| Pm -0 i = \
where f « is some point in the fth sub-interval of length A« generated by the
partition P m . Now since by hypothesis f(x) < g(x), it follows that f(h) <
g(£i), so that the right-hand side of Eqn (7-20) must be non-negative. Thus
we have proved the following theorem.
theorem 7-4 (inequality between two definite integrals) Let/(x) <g(x)
be two integrable functions over the interval [a, b]. Then,
Cf(x)dx < f g(x)dx.
Ja Ja
Equation (7-1) follows as a trivial consequence of this result, for the
theorem implies that if </>(x) </(*)< y>(x) are three integrable functions
over the interval [a, b], then
fb rb rb
(f>(x)dx< f(x)dx< rp{x)dx.
Ja Ja Jx
Hence, if m, M are, respectively, the minimum and maximum values of f(x)
on [a, b], our required result follows by setting </>(x) = m, y>(x) = M, when
we obtain
m(b - a) < f{x)dx ^ M(b - a). (7.21)
Ja
This last simple result implies a more important result which we now
derive by appeal to the intermediate value theorem of Chapter 5. Writing
320 / FUNDAMENTALS OF INTEGRATION
CH 7
inequality (7-21) in the form
m <-
— a Ja
f(x)dx^M
shows that the number
— a Jn,
f(x)dx
is intermediate between m and M which are extreme values of the function
f(x) itself. Hence, provided /(x) is continuous, it then follows from the inter-
mediate value theorem that some number f exists, strictly between a and b,
such that
'«>-rbJ> )dJ
(7-22)
This result is called the first mean value theorem for integrals, and it
constitutes our next theorem.
theorem 7-5 (first mean value theorem for integrals) Let f(x) be con-
tinuous on the interval [a, b], then there exists a number f, strictly between
a and b, for which
f f(x)dx = (b- a)f(i).
Ja
AF = Fix + h) - F{x)
y =/w
7>
O a x x+ h b *
Fig. 7-7 Area below y = fit) as a function of the upper limit of integration x.
7-4 The definite integral as a function of its upper
limit-indefinite integral
If the lower limit of a definite integral is held constant, but the upper limit is
replaced by the variable x, then the numerical value of the integral will clearly
depend on x. Another way of describing this situation is if we say that a
definite integral with a variable upper limit x defines a function of x. In Fig.
7-7 this idea is illustrated in terms of areas, with the shaded region marked
SEC 7-4 UPPER LIMIT-DEFINITE INTEGRAL / 321
F(x) denoting the area below the curve y = f(t) which is bounded on the
left by the line t = a, and on the right by the line t = x.
In terms of the definite integral we have
F(x) = fV(0dr. (7-23)
Now let us suppose that f(t) is continuous in some interval [a, b], with
a< x<b. Notice here that for the first time it is necessary to use the dummy
variable t, because x and t are fulfilling two different roles inEqn (7-23). To be
precise, x represents the upper limit of integration, whilst the dummy variable
t represents the general variable in the interval of integration a<Lt<x.
Consider the difference
F(x + A) - F(x) = f + V(0d* ~ \ X f{t)dt
Ja Ja
Jx
x + h
f(t)dt. (7-24)
Then the first mean value theorem for integrals allows us to rewrite Eqn (7-24)
in the form
F(x + h)- F(x) = hf(M\ (7-25)
where x < f < x + h.
Now, forming the difference quotient {F(x + h) — F{x)}jh, we find
F(x + h)- F(x) _
h ~ Jkih
so that taking the limit as h —*■ gives,
r W = limJ F( ' + *>- fW }-/ W . (7.26)
This important result shows that the integrand of integral (7-23) at the
upper limit of integration / = x is equal to the derivative of F(x) with respect
to x.
Suppose now that G(x) is any function for which G'(x) = f(x). Then,
G'(x) - F'(x) = £ [G(x) - F(x)} = 0,
and so from Corollary 5T2
G(x) = F(x) + constant. (7-27)
Combining Eqns (7-23) and (7-27) shows that the most general function
G(x) whose derivative is equal tof(x) must be of the form
G(x) = f'/COdr + C, (7-28)
Ja
where C is a constant.
322 / FUNDAMENTALS OF INTEGRATION CH 7
The first term on the right-hand side of Eqn (7-28) is called an indefinite
integral. The function G(x) itself is called either a primitive off or an anti-
derivative of/. We shall usually use the name antiderivative, since this offers
an accurate description of the process by which it is to be found. Namely, an
antiderivative arises from the process of reversing the operation of differ-
entiation, and the most frequent method of finding antiderivatives utilizes
this idea by employing tables of derivatives in reverse. That is to say, by
matching an integrand with an entry in a table of derivatives and thereby
finding the functional form of G(x) apart from the additive arbitrary constant.
Usually the antiderivative G(x) defined in either Eqn (7-27) or Eqn (7-28)
is written symbolically in the form
f f(x)dx = F(x) + C. (7-29)
In this notation, the fact that an antiderivative is a function related to the
operation of integration, and not just a number as in an ordinary definite
integral, is indicated by again employing the integral sign, but this time without
limits. On occasions the reader will find books in which an antiderivative is
signified by the notation
f
f(x)dx,
rather than the notation used in Eqn (7-29).
The following short table lists a few of the antiderivatives which are of
most frequent occurrence in mathematics.
Table 7.1
J/(x)dx = F(x) + C
/(*)
F{x)
1
a (const)
ax
2
x n
jfn+i
n + I
3
fJ.X
A 6
4
sin x
— COS X
5
cos*
sin x
Other useful elementary antiderivatives that should be memorized,
together with an account of systematic methods for finding antiderivatives,
are given in the next chapter.
Let us now return to Eqn (727) and notice that it follows from this that
SEC 7-4 UPPER LIMIT-DEFINITE INTEGRAL / 323
G(b) - G(a) = F(b) - F(a) = F(b) = f /(x)d.r. (7-30)
Ja
Hence we have proved that
f(x)dx = G(b) - G(a), (7-31)
Ja
where G'(x) =f(x). This provides a method for the evaluation of definite
integrals, for expressed in words it asserts that the definite integral off(x) taken
over an interval [a, b] is the difference between the value of any antiderivative
of f(x) at x = b and x = a.
It is now time to express results (7-26) and (7-31) in the form of two basic
theorems known, respectively, and the first and second fundamental theorems
of calculus.
theorem 7-6 (first fundamental theorem of calculus) lf/(x) is continuous
for a< x<b, and
F(x) = ["fiWt,
then F'(x) = f(x) for all points x in [a, b].
Alternatively expressed, this result may also be written
d^/> )d '= /W -
theorem 7-7 (second fundamental theorem of calculus) If f(x) is con-
tinuous for a < x < b and G{x) is any antiderivative of f(x), then
f
Ja
f(t)dt = G(x) - G(a).
The statement of Theorem 7-7 is often written in the form
f f(x)dx = G(x)\lZl
Ja
with the understanding that
G(x)\*z b a = G(b) - G(a).
It follows from Theorem 7-7 that the definite integral calculated so
laboriously in Example 71 may be evaluated directly by appeal to entry
number 2 in Table 7- 1 . To see this set n = 2, so that f(x) = x 2 , then F(x)
= x 3 /3, and by Theorem 7-7 we immediately deduce that
f x 2 dx = K* 3 - a 3 ).
324 / FUNDAMENTALS OF INTEGRATION
CH 7
The systematic employment of the fundamental theorems of calculus will
be taken up in detail in Chapter 8, since our concern here is primarily with
the theory rather than the practice of integration.
Finally, to emphasize that the indefinite integral is a function, we now
give an example of such an integral which defines an important mathematical
function. Since we have the relationship
d , l
— loge x = -, for X > 0,
ax x
it follows from Theorem 7-7 that, provided a > 0,
— = loge X — loge 0.
Ja
Hence, setting a = 1 gives the result
C'dt
log.* -J 7
which is illustrated as the shaded area in Fig. 7-8.
(7-32)
Fig. 7-8 Natural logarithm represented as an area.
7-5 Differentiation of an integral containing a
parameter
It can sometimes happen that an integrand, in addition to being a function of
x, also depends on a parameter a. Furthermore, the upper and lower limits
of the integral may themselves be functions of a so that the value of the
integral must then itself depend on a. Our concern in this section will be with
the differentiation, with respect to a, of an integral of the form
/(a) = f{x, a)dx.
(7-33)
To derive the form of our result let us begin by assuming that <^(a), f(cn)
are difFerentiable functions with respect to a in some interval c < a < d,
and that/(x, a) is both integrable with respect to x on the interval [^(a), ^(a)]
SEC 7-5 INTEGRAL CONTAINING A PARAMETER / 325
and differentiable with respect to a. Then, first notice that from the mean
value theorem for derivatives, in c < <x + h < d, we have
<£(« + h) = <£(a) + ft ( -^ J , with oc < f < a + ft;
y(a + ft) = y(a) + ft l-p) , with a < r? < a + ft; (7-34)
/(jc, a + ft)=/(x, a) + ft(/| , witha< £<a + ft.
The partial derivative notation is needed in the last of these results because
for this application of the mean value theorem for derivatives we are regarding
the variable x as a constant.
Now we have
fix, oc + h)dx,
so that using results (7-34) we find
/(oc + ft) = f{x, a + ft)dx + f(x, a + ft)dx
•Mac) Jtf(a)
+ f(x, a + ft)dx
An application of the mean value theorem for integrals (Theorem 7-5) to
the first and last terms then shows that
/(a + ft) = ft (p^\ f(x', a + ft) + P fix, oc + h)dx
where y(oc) < x < y(a) + Ay', <^(a) < x" < <£(a) + h<j>'.
Next, forming the difference /(a + ft) — /(a), combining integrals and
using the final result of (7-34) gives
/(oc + ft) - /(oc) = ft (^) fix', oc + ft) + ft f * " ffi dx
- ft (^) /(*", a + ft). (7-35)
Finally, forming the difference quotient {/(a + h) — /(a)}/ft and taking
the limit as h -*■ it follows that f , »j, and £ all tend to oc, whilst x' tends to •
y(a) and x" tends to <£(a), whence
326 / FUNDAMENTALS OF INTEGRATION CH 7
d/
d
- = (? W «) - (t)m> a) + I"*" %■ dx - ( 7 ' 36 >
a \da/ \da/ J*(«) dx
theorem 7-8 (differentiation of an integral containing a parameter) Let
(f>(x), ^i(a) be differentiable functions with respect to a in some interval
c < a < d, and let /(x, a) be both integrable with respect to x over the
interval <f>(x) < x < ^(a) and differentiable with respect to a. Then,
i r a> /(x, a)dx = (^ W a) - (^ W a) + P ^ dx.
da J*(a) \da/ \da/ J#a) ox
A useful special case of this arises when <^(a) = a and y(a) = b are con-
stants, so that the only dependence on the parameter a is through the inte-
d^ dw
grand /(x, a). The terms — and — - are then identically zero, so that we
da da
arrive at the following corollary.
Corollary 7.8 If /(x, a) is both integrable with respect to x over the interval
[a, b] and differentiable with respect to a, then
d C b C b 8f
— f(x, a)dx = — dx.
da Ja Ja OX
Example 7-8 Apply the results of Theorem 7-8 to the following integral:
|«3 + 2 sin 3a j„
1(a) = f
+ coset X 2 + a 2
Solution If we make the identifications <f>{x) = 1 + cos a, y)(x) = 3 +
2 sin 3a, and /(x, a) = (x 2 + a 2 ) -1 , it then follows directly from Theorem
7-8 that
d/ 6 cos 3a sin a „ f 3 + 2sin3 « dx
/•3 + 2sin3a
a) 2 + a 2 Jl + cosa (.X
da (3 + 2 sin 3a) 2 + a 2 (1 + cos a) 2 + a 2 Ji+cosa (x 2 + a 2 ) 2
7-6 Othe/ geometrical applications of definite integrals
This section offers a brief discussion of the application of the definite integral
to the determination of arc length for plane curves, the surface area of a
surface of revolution, and the volume of a volume of revolution. Each result
will be derived by appeal to the basic definition of a definite integral, since it
will first be necessary to define the precise meaning of the concepts that are
involved.
SEC 7-6
OTHER GEOMETRICAL APPLICATIONS / 327
O a = Xo Xl X2 Xn-1 X„ = b
(a) (b)
Fig. 7-9 (a) Arc length of curve; (b) element of arc length.
(a) Arc length of a plane curve
Consider the plane curve V with the equation y — f(x) illustrated in
Fig. 7-9 (a). Then our task here will be first to define the meaning of the
length s of the arc MN, and then to deduce a method by which it may be
found once the equation of T has been given. Let go, gi, ■ ■■, Qn represent
any set of points on T, the first of which coincides with the left-hand end-
point M, and the last of which coincides with the right-hand end-point N.
Then if A.s t denotes the length of the chord joining g 4 -i to Q ( , the length S n
of the polygonal line joining M to N is
n
i = l
Now the projection of the set of points Qo, Qi, . . ., Qn onto the x-axis
defines a set of points a = xq < *i < . . . <x n = b which form a partition
P n of the interval [a, b]. Thus, denoting the norm of P n by || A \\ Pn , we shall
define the length s of the arc T from M to N to be
lim 2 A 5 *-
n A iip„-*° f=1
(7-37)
Now, setting A< = xt — x«_i and 6i —f(xi) —f(Xi-i), it follows directly
by an application of Pythagoras' theorem (Fig. 7-9 (b)) that
m
a^ = vW-r-<^)= m +
However, by virtue of the mean value theorem for derivatives we may write,
provided that/(;c) is differentiable on [a, b],
Si f(xi) -f(x t -i)
Xi
Xi-l
= /'(&),
where xt~i < f t < xu and so
&s ( = V(l + [/'(&)] 2 ) A«.
(7-38)
328 / FUNDAMENTALS OF INTEGRATION
CH 7
Thus the desired arc length s will be determined by evaluating
s = lim J V(l + ifiidf) A ( .
||A||p B -Ot = l
(7-39)
We see from Definition 7-2 that this is simply the definite integral of the
function \/(l + [f'(x)] 2 ) integrated from x = a to x = b, and hence
* = f VO + [/'(*)] 2 )d* = f III + f^fW (7-40)
theorem 7-10 (arc length of plane curve) Let y =f(x) be a differentiate
function on the interval [a, b\. Then the length j of the plane curve Y defined
by the graph of this function in the (x, jO-plane between the points (a,f(aj),
(b,f(b)) is given by
Example 7-9 Determine the length of arc of the curve y = cosh x between the
points (1, cosh 1) and (3, cosh 3).
Solution We have a = 1, b = 3, y = cosh x, and so dy/dx = sinh x,
whence
-r
V(l + sinh 2 x) Ax--
f 3
cosh x
dx.
Now since d/dx (sinh x) = cosh x, it follows that sinh x + C is an anti-
derivative of cosh x, so that by Theorem 7-7 we have
s = cosh x dx = (sinh x + C~)\\ = sinh 3 — sinh 1.
y = v(<)
O a a x — 0( f ) /J
Fig. 7-10 Length of parametrically defined curve r.
B(r = 7i)
SEC 7-6 OTHER GEOMETRICAL APPLICATIONS / 329
Theorem 7-10 will fail for curves T of the type shown in Fig. 7-10, for
any representation of the function in the form y =f(x) will not be single
valued on the interval [a, /?], and so it will not be differentiate there.
The difficulty here is easily overcome by using the fact that each point on
the curve T can be uniquely defined and a unique derivative assigned if the
curve r is capable of parametric representation in the form
* = <£(/), y = y>(0 for T <t<Ti, (7-41)
with 4>(t), f(t) differentiable on [T , T{].
Using the result for parametric differentiation
J w dx f (0
in Eqn (7-39), and then employing the differential relationship A* = </>'(t)A.t
to define A< in terms of At, we find that
s= lim I /(i + [ffi] V(f,)Af, (7-42)
where u~\ < f * < U.
Thereafter, the argument that gave rise to Eqn (7-40), now gives rise to
3 = loV ( l + ill)] l ^^ = f V(tf {t)? + WitW) ^ (7 ' 43)
theorem 7-11 (arc length of parametrically defined curve) Let <f>(t), y>(t)
be differentiable functions in T < t < T\. Then the length s of the plane
curve defined parametrically by x = <f>(t), y = \p(i) between the points
(<f>(T ), y>(T )), (<f>(Ti), v(ri)) is given by
J'Ti
V([«A'
(tw + mmdt.
(b) Area of surface of revolution
The name surface of revolution is given to any surface which is generated by
rotating a plane curve y = f(x) about either the x-axis or the /-axis. Since
the determination of the area in either case is exactly similar, we shall discuss
only the case of the revolution of the curve y —f(x) about the x-axis, as
shown in Fig. 7-11.
A problem arises here as to how to define the area of a non-cylindrical
curved surface. We propose to approach the problem by sectioning the surface
into annular strips of width A< as shown in Fig. 7-11, and then to approximate
the area AS of each such annular strip by representing it by the conical area
which is obtained by rotating the chord PQ of length Ast about the x-axis.
Then if this element of area of cone between the planes x — x«-i and x = xt
is ASt, this will be given by
330 / FUNDAMENTALS OF INTEGRATION
CH 7
A*^_Q
Fig. 7-11 Area of surface of revolution.
AS, = 2n( y -^±Il) A Si .
(7-44)
Similar elements of area may be defined for each of the other annular
strips defined by some partition P„ of the interval [a, b] by the set of points
a = xo < xi < ■ ■ • < x n = b. Thus, denoting the norm of P n by || A|| Pn ,
we shall define the area S of the surface of revolution generated by rotating
y = fix) about the x-axis, and contained between the planes x = a and
x — b, to be
n n
S= lim 2AS*= lim rr £ iyi-i + yd A*. (7-45)
||A|| Pb -*0 i = l HAIIp„-0 i = l
Hence, if fix) is differentiable in a < x < b, by using result (7-38) we find
S = lim ttJ 0,_i + _y, V(l + [/'(&)] 2 ) A«,
||A||P„^0 t=l
(7-46)
where Xi-i < & < x«.
Once again our previous form of argument shows that this is just the
definite integral of the function 27t/(x)\/(1 + [fix)] 2 ) integrated from x = a
to x = ft, and so
S = 2n \ b fixWi\ + [f'(x)f) dx.
Ja
(7-47)
SEC 7-6 OTHER GEOMETRICAL APPLICATIONS / 331
theorem 7-12 (area of surface of revolution) Let/(x) be a differentiable
function on a < x < b. Then the area 5" of the surface of revolution generated
by rotating the graph of the function / = f{x) about the x-axis, and contained
between the planes x = a and x = A is given by
S = 2n ff(xW(l + [f(xW)dx.
Ja
Example 7-10 Find the area contained between the planes x = — 1 and
x = 2 of the surface of revolution about the x-axis of the curve y = cosh x.
Solution We have a = — 1 , b = 2, and/(x) = cosh x, and so/'(x) = sinh x,
whence
S = 2tt \ cosh x-\/(l + sinh 2 x) dx = 2tt I cosh 2 x dx.
To evaluate this result we now use the hyperbolic identity cosh 2 x = }(1 +
cosh 2x) to obtain
S = 77- (1 + cosh 2x)dx.
Then, as it is easily verified that \ sinh 2x + C is an antiderivative of cosh 2x,
we have from Theorem 7-7 that
12
-I
S = 7T (1 + cosh 2x)dx = tt(x + \ sinh 2x + C)| 2
= |w(6 + sinh 4 + sinh 2).
(c) Volume of revolution
Finally, let us determine the volume of revolution V of the volume shown in
Fig. 7-11. This time, to define the volume of such a figure, we consider
cylindrical elements of volume of thickness A«, and place upper and lower
bounds on that element of volume by the obvious inequality :
77 x (least radius of annulus) 2 x Aj< element of volume <
tt x (greatest radius of annulus) 2 x Aj.
Then, if xi-i < |« < x«, a volume element A Vi satisfying this inequality and
bounded to the left by the plane x = xi-i. and to the right by the plane x = xt
is
AK« = ^[/(fOPAi. (7-48)
The volume of revolution generated by rotating y =f(x) about the x-axis,
and contained between the planes x = a and x = b will then be defined to be
V= lim Trif/dOPA*. (7-49)
||A||p B ^0 i = l
332 / FUNDAMENTALS OF INTEGRATION. CH 7
A repetition of the previous form of argument then yields
V = 77 f [/(x)] 2 dx. (7-50)
Ja
Notice that we have imposed no differentiability requirements on f(x),
so that result (7-50) is applicable even if/(x) is only piecewise continuous.
theorem 7-13 (volume of solid of revolution) Let /(x) be a piecewise
continuous function on a < x < b. Then the volume of the solid of revolu-
tion generated by rotating the curve y = f(x) about the x-axis, and contained
between the planes x = a and x = b, is given by
V=TT f [fixWdx.
Ja
Example 7-11 Determine the volume of revolution generated by rotating the
parabola y = 1 + x 2 about the x-axis, and contained between the planes
x = 1 and x = 2.
Solution Here we have a = 1, b = 2, and/(x) = 1 + x 2 , so that
V = 77 f (1 + x 2 ) 2 dx = 77 f (1 + 2x 2 + x*)dx
( 2x 3 x 5 \
= 77 \X +
2
_ 17877
i ~ "TJ"
7-7 Numerical integration
From the second fundamental theorem of calculus we have seen that the
successful analytical evaluation of a definite integral involves the deter-
mination of an antiderivative of the integrand. Although in many practical
cases of importance an antiderivative can be found, the fact remains that in
general this is not possible and Theorem 7-7 is therefore of no avail. Such,
for example, is the case with an integral as simple as
e-* 2 dx,
f
for although an antiderivative of e~* 2 certainly exists on theoretical grounds,
it is not expressible in terms of elementary functions.
Of the many possible methods whereby a numerical estimate of the value
of a definite integral may be made, we choose to mention only the very
simplest ones here. The general process of evaluating a definite integral by
numerical means will be referred to as numerical integration, though the old
fashioned term numerical quadrature is still often employed for such a
process. The matter of the accuracy of these methods will be taken up
SEC 7-7
NUMERICAL INTEGRATION / 333
elsewhere in connection with applications of Taylor's theorem.
O
-►*
a = Xo Xl Xl X»-l x n =
Fig. 712 Trapezoidal approximation of area.
(a) Trapezoidal rule
Although a strictly analytical derivation of the so called trapezoidal rule
for integration may be given we shall not use this approach, and instead
make appeal to the area representation of a definite integral. Consider Fig.
7-12, and let us estimate the shaded area below the curve y = f(x) which we
know has the value
f
Ja
f(x)dx.
Let us begin by taking any set of n + 1 points a = xo < xi < • ■ •
< x n = b, and on each interval [xt-i, xt], approximate the true area above
it by the trapezium obtained by replacing the arc of the curve through the
points (xi-i,/(x«-i)), (xuf(xi)) by the chord joining these two points.
Then the area of the trapezium on the interval [xt-i, xi\ is
Uf(xi-i) +/(*<)) A*,
where Aa-j = xt — xt-i.
Thus, adding the n contributions of this type, we arrive at the general
trapezoidal rule
f(x)dx *< M/(*o) +/(xi))Axi + K/(*i) +/(x 2 ))Ax 2 + • • •
Ja
+ i (f(Xn-l) + f(x n )) Ax„. (7-51)
If the interval [a, b] is divided into n equal parts of length h = (b — a)jn,
then (7-51) becomes the trapezoidal rule for equal intervals
^^dx = h[lf(x ) +f( Xl ) +/(X 2 ) + • • • +/(*„_!)
+ kf(xn)] + e(h), (7-52)
f
Ja
334 / FUNDAMENTALS OF INTEGRATION CH 7
where an equality sign has now been used because we have included the
error term e(h), which recognizes that the error is, in part, dependent on the
magnitude of h.
(b) Simpson's rule
A different approach involves dividing [a, b] into an even number n of sub-
intervals of equal length h = {b — a)jn, and then approximating the function
over consecutive pairs of sub-intervals by a quadratic polynomial. That is to
say fitting a parabola to the three points (a,f(a)), (a + h,f(a + h)),
(a + 2h, f(a + 2h)) comprising the first two sub-intervals, and thereafter
repeating the process until the whole of the interval [a, b] has been covered.
The value of the definite integral can then be estimated by integrating the
successive quadratic approximations over their respective intervals of length
2h and adding the results. This simple idea leads to Simpson's rule for
numerical integration which we now formulate in analytical terms.
Consider the first interval [a, a + 2h], and represent the function y = f(x)
in this interval by the quadratic
y = co + cix + c 2 x 2 . (7-53)
Then the approximation to the desired integral taken over this interval is
f(x)dx s» (c + cix + c 2 x 2 )dx
a Ja
a + 2h
"(
C0X+ __+_
(7-54)
To determine the coefficients Co, ci, and c% in order that the quadratic should
pass through the three points (a,f(a)), (a + h,f(a + h)), {a + 2h,f(a + 2h))
we must solve the three simultaneous equations
f(a) = co + c x a + c 2 a 2 ,
f(a + h) = c + ci(a + h) + cv{a + h)\
f(a + 2/r) = co + ci(a + 2h) + c 2 (a + 2h) 2 . (7-55)
When this is done and the results are substituted into Eqn (7-54) we arrive at
the desired result
f
Ja
f(x)dx = - (f(a) + Af{a + h) + f(a + 2A)) + e(h), (7-56)
where again we have included the error term by e(h). In its simplest form
Eqn (7-56), together with its error term, is called Simpson's rule. An explicit
form for e(h) in both the trapezium rule and Simpson's rule will be given
later.
If, now, result (7-56) is applied to the intervals [a, a + 2h], [a + 2h,
a + 4h], . . ., [a + (« — 2)h, b] and the results are added, we arrive at
SEC 7-7 NUMERICAL INTEGRATION / 335
Simpson's rule for an even number n of intervals
f f(x)dx = \ [f(a) + 4/(a + h) + Ifia + 2h) + 4f(a + 3h) + ■ ■ •
+ Af{a + (« - \)h) + fib)] + eih), (7-57)
where h = ib — a)\n.
Example 712 Calculate the definite integral
,2 dx
x
-r
by the trapezoidal rule and by Simpson's rule, taking ten integration steps of
length h = 01.
Solution We start by tabulating the functional values of the integrand l/x
at intervals of 01.
X
1
X
10
10000
11
0-9091
1-2
0-8333
1-3
0-7692
1-4
0-7143
1-5
0-6667
1-6
0-6250
1-7
0-5882
1-8
0-5556
1-9
0-5263
20
0-5000
Then, using the trapezoidal rule (7-52), we find
7^01 x [0-5000 + 0-9091 + 0-8333 + 0-7692 + 0-7143 + 0-6667
+ 0-6250 + 0-5882 + 0-5556 + 0-5263 + 0-25],
whence / ^ 0-6938.
The same calculation using Simpson's rule, (7-57), gives
/ & -^- X [1-0000 + 4 X (0-9091) + 2 X (0-8333) + 4 x (0-7692)
+ 2 x (0-7143) + 4 x (0-6667) + 2 x (0-6250) + 4 x (0-5882)
+ 2 x (0-5556) + 4 x (0-5263) + 0-5000],
whence I «» 0-6932.
336 / FUNDAMENTALS OF INTEGRATION CH 7
In actual fact the exact result of this definite integral is log e 2 = 0-69315.
As would have been expected on intuitive grounds, Simpson's rule is more
accurate than the trapezoidal rule.
(c) Integration of interpolating polynomials
A direct extension of the previous method that may be exploited system-
atically to produce integration formulae of high accuracy and flexibility
involves the replacement of the function y = f(x) over the interval [a, b] by
an interpolating polynomial of degree n. Thus, on the interval [a, b], the
function y = f(x) is represented by
y = Co + C!X + C2X 2 + • • • + c n x n , (7-58)
and the numerical integration formula then follows by writing
J'b rb
f(x)dx ^ (c + cix + c 2 x 2 + • • • + c n x n )dx. (7-59)
a Ja
Thus, if the error term is again represented by e(h), we obtain the numerical
integration formula
rb
f
Ja
f(x)dx = c (b - a) + °-± (b* - «2) + | (63 _ fl 3) +
+ —-: (b n+1 - a» +1 ) + e(h). (7-60)
The difficulty in this approach arises from the fact that the sense in which
Eqn (7-58) is to approximate y = f{x) is still to be defined, and this will
influence both the method by which the n + 1 coefficients Co, c\, . . ., c n
are to be determined and, naturally, the error term e(h).
Probably the simplest choice of approximating polynomial, and the only
one to be discussed here, is determined by the requirement that the poly-
nomial and the function should have identical values at it + 1 points
xo < xi < • • • < x n belonging to [a, b]. That is, the requirement that the
graph of Eqn (7-58) should pass through the n + 1 points (xo,/(*o)), (xi,f(xij),
• • ■, (x n ,f(xn))- Such a polynomial is called a Lagrangian interpolation
polynomial, and its form may be written down directly as follows. We illus-
trate the Lagrangian interpolation polynomial Ls(x) of degree 3, which
passes through the four points (xo,f(x )), (xi,/(xi)), (x2,f(x s )), and
(xs, /(xs)). Higher degree polynomials may be constructed in a similar
manner.
(x- xi)(x - x 2 )(x - x 3 )
U(x) = -f(x )
(Xo ~ Xl)(*o — *2)(-X0 — *3)
(X - X )(X - X2)(X — X 3 ) .. .
+ 7 ^7 ^7 ;/Oi)
(xi — xo)(*i — X2)0:i — x 3 )
PROBLEMS / 337
(x - Xp)(x - Xi)(x -X 3 )
(x 2 — X )(X2 ~~ *l)(*2 — X3)
+ (^-^-^-X2) (7 . 61)
(X 3 — X )(X3 — Xl)(X3 — X 2 )
This form of approach to the development of an integration formula is
essential when, as is often the case, the function/(x) is only known in tabular
from.
Example 7-13 Given the following tabular values of a function /(x), derive
the Lagrangian interpolation formula Z-3(x) for/(x).
r
Xr
/(*')
2
2131
1
4
1-242
2
6
4-507
3
7
9-702
Solution It follows by direct substitution into Eqn (7-61) that
(x - 2)(x - 4)(x - 7)
+ (4X2X-I) * (4 ' 507>
(x - 2)(x - 4)(x - 6)
Simplification of this will yield the required third degree polynomial
which may, if desired, then be integrated over any sub-interval of the interval
[2, 7] on which /(x) is defined, thereby yielding an approximation to the
definite integral of/(x) integrated over that same sub-interval.
PROBLEMS
Section 71
71 Let f(x) = ).x on some closed interval a < x < b lying in the positive part of
the x-axis, where / > is a constant. Then, if P n is a partition of [a, b] into n
sub-intervals of equal length, determine the form of the lower and upper sums
338 / FUNDAMENTALS OF INTEGRATION CH 7
§i\; Sp n for /(*) taken over this partition and prove directly by taking the
limit that
lim S Pn = lim §p n .
Hence deduce that
Ja
b j
Ax dx = - (6 2 - a 2 ).
7-2 Let A, /* > be constants, and set /(x) = /i + Ax on some closed interval
a < x < b lying in the positive part of the x-axis. Show, using the method of
Problem 71, that
6 A
(// + Ax) dx = ,i(b - a) + - (b 2 - a 2 ). (A)
Ja
Show also by this method that
Cb
ftdx= n{b - a), (B)
f
Ja
Ja
and deduce from (A), (B) and the result of Problem 71 that
rb pb rb
{n + Ax) dx = I ji dx + I Ax dx.
Ja Ja Ja
This provides a direct proof of the linearity of the operation of integration in
the special case that f(x) = ft + Xx.
7-3 Let /"(x) = e Xx , and take P n to be a partition of the closed interval [a, b] into
n sub-intervals of equal length. By taking the numbers |< of Definition 7-1 to
be at the left-hand end points of the sub-intervals, compute the approximating
sum Sp n corresponding to/(x) = e Ax , and by finding its limit prove that
Ja
b i
t Xx dx =- (e Xb - e Aa ).
7-4 If a < k < b, use the result of Problem 7-3 to deduce that
Cb fk rb
J'b flc pb
e* x dx = e A * dx + e*" dx.
a Ja Jk
This provides a direct proof that the operation of integration is additive with
respect to the interval of integration in the special case that/(x) = s Xx .
7-5 Let [a, b] be any closed interval not containing the origin, and denote by P m
the partition of this interval into m equal sub-intervals each of length (b — d)\m.
Denote by x r the point x r = a + (rjm){b — a) lying at the right-hand end point
of the rth interval. Then, by setting f r = V(x r -ix r ) show, by considering
x r -i — S r and x r — fr, that x r -i < IV < x r +i. By writing /(x) = 1/x 2 in
Definition 7-2; and taking P m and the points f r in that definition to be as
defined above, prove that
r b dx = /l _ 1\
PROBLEMS / 339
n j „ / 1 W i l \
Hint: Use the fact that J, = 1 Z Z~, )\z~~T)
, = 1 Xr-lXr r ~i \ x r ~ x r-lf \Xr-l X r J
7-6 Determine the lower bounds m r and the upper bounds M r of the function
f(x) = 1/(1 + x 2 ) in each of the n adjacent sub-intervals of length l/« com-
prising a partition P„ of the closed interval [0,1]. Use these results to deduce
the form taken by the upper and lower sums Sp n , Sp n and show that
lim (Sp H - S Pn ) = 0.
n— *ao
Deduce from this that
I
hm n { „ , .„ + „ , „„ + a . , a + •
1 + x 2 „" «, |« 2 + l 2 « 2 + 2 2 n 2 + 3 2 n 2 + « s
or, equivalently,
, 1 1,1, 1
= hm n {— + „ , ,„ + „ , „ +
n 2 n 2 + l 2 n 2 + 2 2 n 2 + (n - l) 2
We shall see later that this integral has the value \ir, and so each of these
different expressions has this same interesting limit.
Section 7-2
7-7 Outline the proofs of the results of Theorem 7-3.
7-8 If f(x) = 2x - 3, use result (A) of Problem 7-2 to evaluate the definite
integral
I
(2x - 3)dx.
Rewrite this as the sum of two definite integrals each with a non-negative inte-
grand and verify that their sum leads to the same result.
7-9 Use the result of Problem 7-3 to evaluate the definite integral
r2
I
e~ 3x dx.
'4
7-10 Find the area / between the curves y = x 2 + 2 and y = — x + 1, which is
bounded to the left by the line x = — 1 and to the right by the line x = 2.
7-11 Discuss, without attempting to evaluate any integrals that are involved, the
problem of determining the area between the curves y = 1 + sin x and y = 1
+ cos x which is bounded to the left by the line y = and to the right by
the line j = 2*-.
7-12 Find the area / between the two curves y = 1/x 2 and y = e 05x — 3, which is
bounded to the left by the line x = 1 and to the right by the line x = 2.
7-13 Evaluate the integral
"*/W dx,
-f
340 / FUNDAMENTALS OF INTEGpATION CH 7
given that
Ix for < x < 1 ;
/(*)= 2 + 2x for 1 <x<2;
U - 1 for 2 < x < 3.
714 On the assumption that the definite integral
r b dx
f
Ja
= arcsin b — arcsin a,
Vd - x*)
prove that the improper integral
Jo V(l - * 2 )
is convergent, and determine its value.
7-15 Sketch the area bounded below by the positive x-axis, and above by the line
y = x on the interval < x < 1, and by the curve y = l/x 2 on the interval
1 < x < co . Determine this area / by the use of an improper integral combined
with elementary geometrical arguments.
Section 7-3
7-16 Use Theorem 7-4 to place bounds on the value of the definite integral
/ = I e - * 2 cos 3 x dx.
= '
7-17 Evaluate the definite integral
x 2 dx,
1
and use the result to determine the number g in Theorem 7'5 when it is applied
to this definite integral. Is the number I unique? Repeat the argument, but
this time applying it to the definite integral
i
2
x 2 dx.
2
Is there a unique number f in this case?
7-18 Prove the following result which is a restricted form of the second mean value
theorem for integrals. Let f(x) > be continuous and monotonic decreasing
on [a, b], and let^(;t) > be continuous on [a, b]. Then,
f f(x)g(x)dx=f(.a) [ g(.x)dx,
Ja Ja
where a < f < b. State the corresponding form of the theorem when/(x) >
is continuous and monotonic increasing on [a, b]. [Hint: Consider the inte-
grand f(a){f{x)g{x)lf{a)} and use Theorem 7-4.]
7-19 The requirement of continuity for/(jc) in Theorem 7-5 is essential, for without
PROBLEMS / 341
it the result of the theorem may, or may not, be true. Illustrate this by con-
sidering step functions /(x) defined on the interval [1, 4], and show that it is
possible to define ones for which,
(a) no number f exists which satisfies Theorem 7-5;
(b) an infinity of numbers f exist satisfying Theorem 7-5.
Section 7-4
7-20 Use Theorem 7-7 to evaluate the following definite integrals:
(a) (x 5 ' 2 + 3e*)dx, (b) sin x dx, (c) sin x dx,
rb f" r%«
(x 5 ' 2 + 3e*)dx, (b) sin x dx, (c) si
Ja Jo Jo
f
Jo
(d) | sin x | dx.
Jo
7-21 Use Theorem 7-7 to determine the area contained between the x-axis and the
curve y = 1 + x 3 + 2 sin x, which is bounded to the left by the line x =
and to the right by the line x = ■*.
7-22 Using the basic properties of the logarithmic function listed in Section 6-3,
express logo x in terms of an indefinite integral, and sketch the interpreta-
tion of the result as an area below a curve.
Section 7-5
7-23 Apply Theorem 7-8 to the following integral, but do not attempt to evaluate
the result :
Ja
l + o 2
t~ x cos ax dx.
7-24 Apply Theorem 7-8 to the following integral, but do not attempt to evaluate
the result:
j* 00
1(a) = x"-^-* dx, (a > 0).
7-25 This problem outlines an alternative form of proof for the result of Theorem
7-8. It is based on the chain rule for differentiation and on a direct proof of
Corollary 7-8. Define the function F(a, #a), v( a )) by the equation
F(a,<£(a),H«))= I "A*,«)d*.
Then it follows from the chain rule for differentiation that the derivative of
the integral with respect to a is given by
dF_dF8Fdy>8Fdf
da da. dy) da 8<j> da
Use Definition 7-3 together with the first fundamental theorem of calculus
to prove that
342/ FUNDAMENTALS OF INTEGRATION CH 7
8F 8F
— =/[y(°e), oc] and — = -/fo(a), a]. (B)
Finally, obtain the statement of Theorem 7-8 by substituting results (B) into
(A) and giving a direct proof, as in the text, that
8 p rv if
— f(x, x)dx = ir-Ax,
where for the purposes of partial differentiation with respect to a, the limits <t>, y>
are to be regarded as constants.
Section 7-6
7-26 Express in terms of a definite integral the arc length of the curve y = 1 + x 2
+ sin 2x, that lies between the points on the curve corresponding to x = 1
and x = 4.
7-27 Prove that the circumference of a circle of radius a is 2-na by using the para-
metric equations of a circle x = a cos t, y = a sin / with < / < !■*.
7-28 Find the area contained between the planes x = — 2 and x = 3 of the surface
of revolution about the x-axis generated by the curve y = 2 + cosh x.
[Hint: An antiderivative of cosh x is sinh x + C]
7-29 If the curve j = f(x) has an inverse x = <f>(y), state the form taken by Theorem
7-12 when the curve y = f(x) between the points (a,/(a)) and (b,f(b)) is
rotated about the y-axis.
7-30 Determine the volume contained between the parabola y = 2 + x + x 2 and
the cubic y = 5 + 2x + x 3 , which lies between the planes x = 1 and x = 2.
7-31 If the curve j = f(x) has an inverse x = <t>(y), state the form taken by Theorem
7- 13 when the curve y = f{x) between the points (a,f(a)) and (b,f(b)) is
rotated about the y-axis.
Section 7-7
7-32 Evaluate the definite integral
"3
(x 3 + 2x+ \)dx
f
by the trapezoidal rule using four intervals of equal length and then by
Simpson's rule for the same intervals. Compare the result with that obtained
by direct integration. Infer from your result that Simpson's rule is exact for
cubic equations despite the fact that it is based on a parabolic fitting of the
function.
7-33 State the form of the Lagrangian interpolation formula L2OO, and use it to
deduce Simpson's rule, (7-56), by applying it to the three points (a,f(a)),
(a + h,f(a + h)) and (a + 2h,f(a + 2 h)) through which the function y =
f(x) passes.
PROBLEMS / 343
7-34 Let the curve r be defined in terms of the polar coordinates (r, 6) by means
of the equation
where /( 6) is a continuous function. Then if P n is a partition of the interval
a < 9 < ft into the points a = O < 9i < • ■ ■ < 0« = /? with the norm
II A IUv prove that the area A between the origin and the curve r which is
bounded by the radius vectors = a and 6 = ft is given by
A= lim 2 iPdOA,,
where 0»-i < Si < Qt and X = 0; — 0,-1. Hence deduce that
f(3
-r
/ 2 (0)d0.
Use this result to find the area swept out by the radius vector drawn from
the origin to the Archimedian spiral r = k e between the radius vectors = a
and 6 = ft, with ft > a.
7-35 Consider a straight rod of length L which has a uniform cross-sectional area.
Aligning the *-axis with the rod in such a manner that the origin coincides
with the left-hand end point, assume that the mass M(x) of material contained
in the rod in the interval [0, x] is given by
T
Jo
M{x) = P (t) At.
Jo
Then the essentially non-negative function p(x) is called the linear density
distribution of the matter in the rod, and by the first fundamental theorem of
calculus it follows that p(x) = M'(x).
Now in mechanics the moment of inertia I about an axis of a point mass m
situated at a perpendicular distance x from that axis is defined to be mx % . By
considering a partition P n of < x < L into the points = xo < xi < ■ ■ •
< x n = L with the norm 1 1 A 1 1 Pn , prove that the moment of inertia / of the
rod about an axis perpendicular to the rod and passing through an end point
is given by
n
/= lim 2 fiMff)Ai,
||A||p B -0 i = l
where xt-i < I; < xt and A, = x t — x,-i. Hence deduce that
x 2 p(x) dx.
Jo
In the case of a rod of mass M having a uniform linear density p(x) = p ,
deduce the relationship between pq and M and use it to prove that the moment
of inertia of the rod about an axis perpendicular to its length and passing
through an end point is
ML 2
344 / FUNDAMENTALS OF INTEGRATION CH 7
7-36 Consider a circular disk of radius a, and suppose that the mass M{r) of material
contained within a circle of radius r drawn about its centre is given by
M{t) = 2t, t P (t)dt.
-H
Then the essentially non-negative function />(r) is called the area density
distribution of the matter in the disk, and by the first fundamental theorem of
calculus it follows that 2nrp(r) = M'(r).
Use the form of argument outlined in the previous problem to prove that
the moment of inertia / of the disk about an axis perpendicular to its plane
and passing through its centre is given by
'= 2n\
Jo
r 3 p(r) dr.
If the disk is of mass M and has a uniform area density p(r) = po, deduce the
relationship between />o and M and use it to prove that the moment of inertia
of the disk about an axis perpendicular to its plane and passing through its
centre is
Ma*
2 '
7-37 Indicate by means of simple examples how the integral inequality (71) may
be used to place upper and lower bounds on the integrals denning the area A
and the moment of inertia / in Problems 7'34 to 7-36.
Systematic integration
8-1 Integration of elementary functions
The main objective of this chapter is to explore some of the systematic
methods for determining an antiderivative, that is, a function F(x) whose
derivative is equal to some given function f(x). As described in the previous
chapter, we shall denote the antiderivative of the function /by jf(x)dx with
the understanding that
J/(x)d* = F(x) + C (8-1)
with C an arbitrary constant.
Alternatively, as any indefinite integral of/ must also be an antiderivative
of/, we may identify F(x) in Eqn (8-1) with f{t)dt where a is arbitrary, to
Ja
obtain the equivalent expression
jf(x)dx = f f(t)dt + C. (8-2)
Remember that the symbol §f(x)dx for the antiderivative of/derives from
differentiation and denotes the most general function whose derivative is/
C
The allied symbol f(x)dx, denoting a definite integral of/ derives from
Ja
integration and is simply a real number. Considering the definition of an
antiderivative, we shall say that two antiderivatives are equal if they only
differ by a constant.
It should be recalled that the connection between the concepts of an
antiderivative and a definite integral is provided by the fundamental theorem
of calculus, which asserts that
Jf/tod* = { jV« d *) z _ b ~ [\f {x)dx ]
In view of Eqn (8.1) this may be written
b
f(x)dx = F(b) - F{a). (8-3)
Ja
Very often in texts the term indefinite integral is loosely ascribed to the
entire right-hand side of Eqn (8-2) instead of, as here, only to its first term.
This is usually justified by the fact that a is arbitrary though, of course, it
/■
Ja
346 / SYSTEMATIC INTEGRATION CH 8
does not necessarily follow that all possible constants C can be absorbed into
the integral by a suitable choice of a. For example, we have the antiderivative
J cos xdx = sin x + C,
though if for some particular problem it was appropriate to set C = 3, say,
then no choice of the arbitrary constant a would enable us to equate
cos xdx and sin x + 3, for this would imply that sin a = — 3.
Unfortunately, the theorems for the differentiation of wide classes of
functions seldom have any counterpart for determining antiderivatives.
Ultimately, success in finding an antiderivative depends on whether or not
the function/can be so simplified that one may be recognized by using tables
of derivatives in reverse : that is, matching the desired derivative / with one
in the table, and reading backwards to deduce an antiderivative. Thus, to
find the antiderivative of 3 sec x tan x, we first glean from Table 51 that
d
— (sec x) = sec x tan x
ax
or, equivalently,
— (3 sec x) = 3 sec x tan x
dx
showing that the antiderivative is
j" 3 sec x tan x dx = 3 sec x + C,
In colloquial terms, the process of finding the most general antiderivative of
the function /(x) is called the 'integration of/(x)'.
Table 8-1 gives a preliminary working list of important integrals which
has been compiled from the tables of derivatives in Chapters 5 and 6.
The two separate results shown against number 3 are usually contracted to
dx
J
log | x | + C,
with the tacit understanding that the arbitrary constant C differs according
as x is positive or negative. With obvious modifications, this convention will
be extended to include all integrals involving the logarithmic function.
Specific examples involving this convention are to be found in Problems
8-1-8-3.
The following statement is equivalent to both Eqn (8-1) and Eqn (8-2),
and it arises as a direct consequence of the definition of an antiderivative.
We formulate it as a general theorem.
SEC 8-1 INTEGRATION OF ELEMENTARY FUNCTIONS / 347
Table 81 Basic table for integrals
dx = + C (nt- -1);
n + 1
'• J*
C a x
2. a x dx =
J lo g"
C dx __ (log x + C
' J x [log(-x) ■
/
+ C (a > 0);
for x >
+ C for x < 0;
4. I z ax dx = - t" x + C (a T 4 0);
r i
5. cos ax dx = - sin ax + C (a + 0);
J «
6. I sin ax dx = cos ax + C (a ^ 0);
a
J-
r dx
J V(a 2 - x 2 ) ~
J a 2 + x 2 a
1 - I ,t„2. "1 ~2> = arcsin - + C for | x | < | a
arctan- + C (o^O);
a
. dx x
1 \ (a 2 + x 2 ) a
i . x
arccosh - + C for x > a,
io. i r j2 = B
- arccosh /—?j + C for x < _ a;
X f d -
J \ (.V 2 - a
dx 1 x
: = - arctanh - + C for I x I < I a I ;
. , i dx 1 x
12. — ; = arccoth - + C for I x I > I a
I V- — /7- fi n 'II
x a- a a
THEOREM 8-1
^J7(x)d.Y=/(.v).
In words, this general result merely asserts the obvious fact that the
derivative of the antiderivative of a function /(x) is the function /(x) itself.
Its most frequent application is probably to the verification of antiderivatives.
For example, let us use the theorem to verify the antiderivative
348 / SYSTEMATIC INTEGRATION CH 8
g'dx
Via 2 - g 2 )
"■" (?)
= arcsin ^ + C, (A)
where g = g(x) is some difTerentiable function of .v and \ g\ < a.
By Theorem 81 we must have
d C g'dx
_ I s __ * (B)
J V(« 2 - S 2 )
dx J V(« 2 - S 2 ) V(« 2 - g 2 )
Now, differentiating the right-hand side of (A) we find
d
d^
-ft
arcsin - + C
1
V(l-(g/«) 2 ) «
g'
Via 2 - g*)
which is identical with (B). Thus, (A) is verified.
A final general result of great value is the fact that the derivative of a
linear combination of functions is equal to the same linear combination of
their derivatives (Theorem 5-4). Expressed in terms of antiderivatives this
implies the following general theorem.
theorem 8-2
j (kif+ k 2 g)dx = kitfdx + krfgdx.
It is, of course, this theorem that permits us to simplify many expressions
to the point at which antiderivatives may be deduced from tables of standard
integrals (antiderivatives) such as Table 8T. Hence we have
J (5jc 2 - 2 cos x)dx = 5jx 2 d;t - 2J cos xdx
5x 3
= 2 sin x + C.
3
The separate arbitrary constants associated with each of the antiderivatives
on the right-hand side have, of course, been combined into the single arbitrary
constant C.
The remaining sections of this chapter are concerned with outlining the
details of the main techniques available for finding antiderivatives.
8-2 I ntegration by substitution
Possibly the most frequently used technique of integration is that in which
the variable under the integral sign is changed in a manner which simplifies
the task of finding the antiderivative. This process is known as integration by
substitution or integration by change of variable. It is in this technique that
SEC 8-2 INTEGRATION BY SUBSTITUTION / 349
the full significance of the symbol dx in Eqn (8-1) is first realized. Indeed, by
making a straightforward application of the chain rule for differentiation
(Theorem 5-7) we shall arrive at a simple mechanical rule for effecting a
variable change by using differentials.
Because composite functions (functions of a function) of x often occur
under the integral sign we shall consider a general antiderivative of the form
/ = SKx) .f[g(x)]dx.
In order to cover all likely cases we shall consider the effect on /of chang-
ing the variable x to the variable w, where x and u are related by
g(x) = h(u), (8-4)
with/, g differentiable functions.
Let us start by supposing that
/ = Sk(x) .f\g{x)]dx = F(x) + C, (8-5)
so that we know
^=k(x).f[g(x)]. (8-6)
Applying the chain rule to F(x) gives
dF(x) _ dF dx
du dx du
which, by virtue of Eqn (8-6), may be written
On the assumption that Eqn (8-4) may be solved for x in the form
x = g-i[h(")] (8-7)
we arrive at the result
dF(x) dx
-±! = kir 1 Wu)]}f[h(u)] -■ (8-8)
Now by implicit differentiation (Corollary 5T9 (a)) of Eqn (8-4), it follows
that provided g'(x) =£ 0,
dx _ AXm)
du g'(x)
so that
dF(x) k{g-i[h(u)]}f[h(u)W(u)
du g'lrWu)]}
(8-9)
350 / SYSTEMATIC INTEGRATION CH 8
However, Eqn (8-9) simply asserts that F(x) is an indefinite integral of
k{g^[h{u)]}[Ku)]h\u)
g'ig-HK")]}
and thus taking the antiderivative yields
/
gig- 1 [*(")]}
du = F(x) + C*, (8-10)
where C* is an arbitrary constant.
Comparing Eqns (8-5) and (8-10), and on the understanding that two
antiderivatives are equal if they only differ by a constant, we have thus proved
that
j*(.v)./r,(*)]d*= j g , {g _ 1[h(u)]} d, (8.11)
This forms the result of the following theorem:
theorem 8-3 (integration by substitution) If g, h are differentiable func-
tions and g(x) = h{u), with g'(x) ^ and x = g~ l [h{u)], then
\
k(x) .flg(x)]dx =
k{g- l [Ku)\}f[h{u)]h\u)
g'{g- l [h{u)]}
Two special cases occur when (a) k(x) = 1 and g(x) -i x, so that g'(x)
= 1, and (b) k(x) ^ 1 and h(u) = u, so that h'(u) = 1. These are stated as
Corollaries 8-3 (a, b) below, which are the results most often to be found in
textbooks.
Corollary 8-3 (a) If x = h{u) is a differentiable function of w, then
f/(x)d.v = Sf[h(u)]h'(u)du.
In terms of the differential relationship d/i = h'(u)du this is also capable of
expression in the form
Sf(x)dx = Sf(h)dh.
Corollary 8-3 (b) If g(x) = u is a differentiable function of x, with
g'(x) ^ 0, then
jf[g(x)]dx= jf(u)(~)du
SEC 8-2
or, as dx/dw = l/g'[g _1 ( M )]>
f flg(x)]dx =
INTEGRATION BY SUBSTITUTION / 351
dw.
AH of these results may be conveniently summarized in the form of a single
simple mechanical rule for changing the variable in an antiderivative.
Rule 1 (Integration by substitution)
We suppose that in the antiderivative
I=SKx).f[g(x)]dx
it is required to change from the variable x to the variable u by means of the
relationship g(x) = h(u), where g and h are differentiable functions, with
g '(x) ^ 0. The result may be deduced from / above by :
(a) replacing g(x) in f[g(x)] by h(u);
(b) solving g(x) = h(u) for x in the form x = g* 1 [h(u)] and then replacing
x in k(x) by this result ;
(c) replacing dx by dw, where du is obtained from the differential rela-
tionship g '(x)dx = h'(u)du;
(d) replacing x in g'(x) by x = g _1 [^( M )L
We now illustrate the application of this rule in a series of examples.
Unfortunately, although the rule tells us how to change the variable, it offers
us no information on the type of variable change that should be made. That
is to say it does not tell us the functional form off and g. Only experience
can help here.
Example 8-1 Evaluate the antiderivative
/ = f jcVO + x 2 )dx.
Solution This antiderivative is of the most general type contained in
Theorem 8-3. First we make the obvious identification k(x) = x z and then,
to remove the square root function which is difficult to manipulate, we shall
try setting
1 + x 2 = I/ 2 .
That is to say, in the hope that it will lead to a simpler expression, we make
the further identifications
g(x) = 1 + x 2 and h(u) = u 2 .
The function /in Theorem 8-3 then becomes the square root function, with
V(l + x 2 ) = u. Rather than solving for x, for the moment we shall use the
result x 3 = x . x 2 = x(u 2 - 1), when we find xV(l + x 2 ) = xu(u 2 - 1).
Now g'(x) = 2x and h'(u) = 2m, so that the differential relation g'{x)dx
352 / SYSTEMATIC INTEGRATION CH 8
= h'(u)du gives rise to xdx = udu. Hence, in differential form,
*V(1 + x 2 )dx = u(u 2 -l)xdx = « 2 (w 2 - l)dw,
and so by the rule derived from Theorem 8-3,
/ = J" *V(1 + x 2 )dx = f k 2 (m 2 - l)dw.
The antiderivative on the right-hand side is now straightforward and may be
integrated on sight to give
r5 jy3
U° W
'-T-? +c
or,
r (1 + x 2 ) 5/2 (1 + x 2 )*' 2
Example 8-2 Evaluate the antiderivative
/ = J V(l + x 2 )dx.
Solution In this antiderivative k(x) = 1, but it is not immediately clear how
best to change the variable. It is left to the reader to see why neither of the
possible substitutions u 2 = 1 + x 2 or u = 1 + x 2 bring about any effective
simplification. Instead, let us seek to remove the square root by making the
substitution x = sinh u, so that the problem becomes analogous to Corollary
8-3 (a). Then 1 + x 2 = 1 + sinh 2 u = cosh 2 u, so that \/(l + x 2 ) = cosh u.
Next, as g(x) — x and h(u) = sinhu, g'(x) = 1, h'{u) = coshw and so
dx = cosh udu. Applying the rule then gives
V(l + x 2 )dx = cosh u . cosh udu = cosh 2 udu,
whence
/ = j 1 cosh 2 udu.
Now use the identity cosh 2 u = J(cosh 2w + 1) to give
/ = if (cosh 2w + l)du
u
— I sinh u + - + C.
To return to the variable x it is necessary to use the results u = arcsinh x,
cosh u = V(l + x 2 ) together with the identity sinh 2u = 2 sinh u . cosh u
to obtain
/ = |[xV(l + x 2 ) + arcsinh x] + C.
SEC 8-2 INTEGRATION BY SUBSTITUTION / 353
Example 8-3 Evaluate the antiderivative
/ = J cos(l + 3x)d*.
Solution This antiderivative has k(x) = 1, and by setting 1 + 3x = u so
thatg(x) = 1 + 3x, h{u) = w it reduces to the situation of Corollary 8-3 (b).
Applying the rule we find that cos (1 + 3x) — cos u and 3dx = dw, whence
/■ = J" \ cos udu
= I sin u + C,
and thus
/ = J sin (1 + 3jc) + C.
Example 8-4 Evaluate the antiderivative
/=» J2xV0 + x 2 )dx.
Solution Setting u = 1 + x 2 it follows that dw = 2xdx, so that
2x\/{\ + x 2 )dx = \/Mdw,
whence
/ = J" V"dw = f m 3/2 + C
= |(1 + x 2 ) 3/2 + C.
It is interesting to notice that when the situation found in Example 8-4 is
expressed in terms of Theorem 8-3 by making the identification k(x) = g'(x)
and then setting u = g(x) it gives rise to the general result
/£'(*) ./te(*)]d*=f/(ii)d«. (8.12)
This is not, of course, a new result since it is no more than the statement
of Corollary 8-3 (a) with the roles of x and u interchanged.
It is an immediate consequence of Eqn (8-3) that Theorem 8-3, together
with its corollaries, also applies to definite integrals provided that the limits
are also transformed by the same transformation law. The restatement of
Theorem 8-3 in terms of definite integrals is as follows:
theorem 8-4 (integration of definite integrals by substitution) If g, ft are
differentiable functions and g(x) = h(u), with g'(x) =£ and x = g -1 [A(w)],
u = h~ 1 [g(x)], then
J« Jl'-H'Xa)] g {g-HKu)]}
One specially simple case of this theorem merits recording in the form of a
corollary. It is the result corresponding to Eqn (812) and is obtained by
354 / SYSTEMATIC INTEGRATION CH 8
making the identifications k(x) = g'(x), u = g(x).
Corollary 8-4 If u = g(x) is a differentiable function, then
fb fg(b)
Ag(x)].g'(x)dx= f(u)du.
Ja Ja(a)
When expressed in the form of a mechanical rule, Theorem 8-4 is as
straightforward to apply as was our previous rule.
Rule 2 (Integrating definite integrals by substitution)
We suppose that in the definite integral
<•&
(x)]dx
fV) -mi*
Ja
it is required to change from the variable x to the variable u by means of the
relationship g(x) = h(u), where g and h are differentiable functions, with
g '(x) ^= 0. The result may be deduced from / above by :
(a) transforming the differential expression k(x) .f[g(x)]dx as indicated
in Rule 1 ;
(b) solving g(x) = h(u) for u in the form u = /? _1 [^(x)] and replacing
the upper limit b by h- l [g(b)] and the lower limit a by /* -1 [g(tf)].
Example 8-5 Evaluate the definite integral
1= [ X 2 A /( 1 _ X 2) dx
Jo
Solution Let us make the substitution x = sin w, so that dx = cos udu,
when
x 2 V(l — x 2 )dx = sin 2 u . cos u . cos udu
— sin 2 u . cos 2 udu.
Then, as u = arcsin x, using the principal branch of the sine function, we
find from Rule 2 that
J-l /*arcsin 1
x 2 -\/(l — x 2 )dx = sin 2 u . cos 2 udu
Jarcsin
= sin 2 u . cos 2 udu.
Jo
To evaluate this last definite integral we use a technique from Chapter 6
which is often helpful. From Definition 6-6 we may write
(g<« g-«'M\ 2 /gJM _1_ g-<tt\2
2i M 2 /
SEC 8-3 INTEGRATION BY PARTS / 355
i q2iu 2-1- e~'" iW \ i q^ u + 2 4- Q-2tu\
\ -4
^16 '
: )( !
and thus
sin 2 u . cos 2 u = J(l — cos 4w).
Using this result in the definite integral, which may then be evaluated on
sight, we finally obtain
I = i (1 — cos 4u)du
Jo
J [u - (i sin 4w)]
= TO"",
I)
and so
f
Jo
.Y'V(1 - -V 2 )d.V = ^7T.
Example 8-6 Evaluate the definite integral
/ = (2.v + 5) cosh (.v 2 + 5.y + ljd.v.
Solution Inspection shows that this example is of the form of Corollary 8-4,
with the function/^ cosh and g(x) = x 2 + 5x + 1.
As g(0) = 1, g(l) = 7, by setting u = g{x) we at once obtain
/ = cosh udu = (sinh 7 — sinh 1).
8-3 Integration by parts
This most valuable technique is based on Theorem 5-5, concerning the
derivative of the product of two functions. That theorem asserts that if/, g
are two differentiable functions of .y, then
^lf(x)g(x)] = lf(x)g'(x)] + [f'(x)g(x)].
Taking the antiderivative of this result gives
/(X)g(x) = i'f(x)g'(x)dx + J>(.Y)/'(.Y)d.Y
which, on rearrangement, becomes
J f(x) g '(.Y)d.Y = f(x) g(x) - / g(x)f'(x)dx. (8.13)
356 / SYSTEMATIC INTEGRATION CH 8
This is one form of the required result. Using the differential notation
df = f\x)dx, dg = g'(x)dx enables this to be contracted to the equivalent
and easily remembered alternative form
Sfdg^fg-Sgdf. (8-14)
These results are now formulated as our next theorem:
theorem 8-5 (integration by parts) If/, g are differentiable functions of x,
then
$f(x)g'(x)dx =f(x)g(x) - j>(.Y)/'Cv)d.Y
or, expressed in differential notation,
Sf&g=fg-Sg&f-
This useful theorem is the nearest possible approach to a general theorem
for finding the antiderivative of the product of two functions. It depends on
the fact that often the antiderivative j" g d/ is easier to determine than the
antiderivative J/dg. Naturally, the technique of integration by substitution
can also be employed when evaluating J g df.
When definite integrals are involved it is not difficult to see that the result
is still valid provided the limits are also applied to the product/*. The general
result is as follows:
theorem 8-6 (integration by parts: definite integral) If/, g are differenti-
able functions of x in [a, b], then
f/(.v)g'(.v)d.Y=/(.Y)g(.Y) '' - \ h g(x)f\x)dx
J a if J a
= [f(b)g(b)] - [f(a)g(a)j - jj(x)f'(x)dx.
As before, we illustrate both of these theorems by means of a series of
examples. These have been carefully chosen to demonstrate a variety of
situations in which integration by parts is useful.
Example 8-7 Evaluate the antiderivative
/ = J x k log x dx for.Y >0, k ^ -1.
Solution The problem here, as with all applications of the technique of
integration by parts, is to decide upon the functions /and g. A little experi-
mentation will soon convince the reader that / will only simplify if we set
f{x) = log .y and g(x) = x^ l /(k + 1), for then g'(x) = x k and f'(x) = l/.v.
Accordingly we write / in the form
SEC 8-3
INTEGRATION BY PARTS / 357
/= loexd
rk-H
lk+ 1.
Applying Theorem 8-5 gives
x k+l ] g x
/ =
k +
1 J k + 1 x
r *+l
log X
r k+l
+ c.
k+l (k+l) 2
Example 8-8 Evaluate the definite integral
(•1/2
Jo '
arcsin x dx.
Solution This time we make the identifications/^) = arcsin x and g(x) = x
and write
-1/2
f
arcsin xd[x] = x arcsin x
Jo
We have
1/2 p
o Jo
!/2 x dx
V(i - * 2 )
(A)
x arcsin x
1/2
= 77/12 - = 77/12
but the definite integral on the right-hand side is still not recognizable. To
simplify it let us now set u = 1 — x 2 so that x dx = — \ du; using Theorem
8-4 we obtain
2« 1/2
3/4 = 1 - ^1.
1 2 '
f 1 ' 2 xdx _ 1 I* 3 '- 1 du _ x
Combining this result with (A) gives
arcsin x dx = 77/12 ^ — 1.
Jo 2
Example 8-9 Evaluate the antiderivative
/ = J t ax sin foe dx.
Solution This time we choose to make the identification f{x) = sin bx,
g(x) = (lla)e ax and to write / in the form
/ = sin bx d I - e ax I •
Integrating by parts we find
t ax sin bx dx = - e a * sin fcx
J a
-?/
e TO cos bx dx.
358 / SYSTEMATIC INTEGRATION CH 8
Now let us use this same device on the second term above to obtain
e ax sin bx dx = - c ax sinbx cos bx d | - e ax j
J a a J \a /
1 . , b b 2 r
= - e ax sin bx e ax cos bx e ax sin bx dx + C.
a a 2 a 2 J
Combining terms gives
, b 2 \ [ ■ , , z ax (a sin bx — b cos bx)
1 + — e ax sin bx dx = + C,
and so
/
. , , e ax (a sin bx — b cos bx)
e«* sin bx dx = — + C*
a 2 + b z
where C* is related to C by C* = a 2 C/(a 2 + b 2 ). In fact there is no necessity
to distinguish between C and C*, since as C was an arbitrary constant of
integration, C* is also an arbitrary constant. For this reason it is not
customary to redefine arbitrary constants when, as above, they are simply
multiplied by a constant factor.
8-4 Reduction formulae
It not infrequently happens that an antiderivative / involving a parameter m
may be reduced by means of the technique of integration by parts to an
expression in which the parameter has a value differing by an integer k from
its original value. If we denote such an antiderivative by I m , then a typical
situation is the one in which we arrive at an expression of the form
I m = A(m) + 7 m _i, (8-15)
where A(m) is some known function.
Expressions of this form provide an algorithm for the computation of any
antiderivative of the given type once one of them is known, for the I m are
then defined recursively by this relation in terms of h, say. It is customary to
refer to expressions of the general form of Eqn (8-15) as reduction formulae.
The same idea is equally applicable, without essential modification, to definite
integrals.
Example 8-10 Determine the reduction formula for
l m = J cos™ d0.
Use the result to determine h.
Solution We rewrite I m as follows and use integration by parts.
SEC 8 . 4 REDUCTION FORMULAE / 359
I m = J cos™- 1 d(sin 0)
= cos™- 1 . sin - J sin 9 . (m - 1) cos™~ 2 0(-sin 0)d0
= cos™" 1 ^ . sin + (m - 1) J cos™" 2 6 . sin 2 9 d0
= cos™' 1 . sin + (m - 1) J cos™" 2 0(1 - cos 2 0)d0
= cos™" 1 . sin e + (m - 1) / cos™- 2 d0 - (m - 1) / cos™ d0.
Recalling the definition of I m we discover that this may be re-expressed in
terms of I m and I m -2 as
I m = cos™" 1 . sin + (m — l)/ m -2 — (m — X)I m ,
whence we arrive at the required reduction formula
cos™ -1 . sin -/ w — * N
7 m = H I I Im-2.
m
t^)
Setting m = 7 gives
cos 6 . sin 6
/7- ~ 7 + -/ 5
cos 6 . sin 6 /cos 4 . sin 4 )
. sin 6 /c
"7 + 7(-
5 5
cos 6 . sin 6 „ . . . 24 /cos 2 . sin 2 r \
= + -cos 4 0. S1 n0 + -( - + -/,)•
As h = J" cos d0 = sin + C this gives the result
/
1 fs R
cos 7 dd = - cos 6 . sin + —■ cos 4 . sin 6 + — cos 2 8 , sin i
+ £*** + <:■
Example 8-11 Evaluate the definite integral
Jo
cos™ d0
J 'in
sin™ d0.
o
Solution We can make use of the reduction formula determined in the
previous example. It follows from
cos™ -1 . sin i.
Im = 1- ( I Im-2
in
that the definite integral J m obeys the reduction formula
360 / SYSTEMATIC INTEGRATION
CH 8
J m —
cos" 1-1 d . sin 6
** , lm- 1\ fm- 1\
+ ^m-2 = 7 m -2.
o \ m J \ m ]
We must now consider separately even and odd values of m. Firstly, if m is
even, so that we may write m = 2n, then
Jin
In - 1 2n - 3
>
2« 2m - 2
Secondly, if m is odtf, so that we may write m = 2n + 1, then
In In — 2
Jin+l
2n + 1 2« - 1
>■
J 'in /•}*
I dd = %tt and /i = cos d0 = 1 , we
o Jo
obtain :
1 . 3 . 5 . . . (2« - 1) ,
Jin
J2n+1 =
2.4.6. . .2n
2.4.6. . .In
3.5.7,
(2« + 1)
Finally let us prove that
•frr
sin m x dx.
Jm = COS™ X dx = i
Jo Jo
To achieve this make the variable change x = \n — u in J m to obtain
flir fO /"J"
cos m x dx = — cos m (|7T — w)dw = cos m (%tt — tt)dw
Jo Jiir Jo
1"
= I sin m w dw.
1
This last result is of some interest historically, as it provided the first
infinite product representation for -n. One form of the argument used to derive
this result proceeds as follows.
It is readily seen from the expressions for J% n and /2n+i that
in =
' 2.4.6. . .In '
3.5. . .{In- 1).
1
Jin
In + 1 J2n+1
Now in the interval (0, ^tt) the following inequalities hold :
sin 2 * 1 - 1 x > sin 2ra x > sin 2 » +1 x > 0,
so that as
(8-16)
SEC 8-4 REDUCTION FORMULAE / 361
J m = sin m x dx,
Jo
it follows at once that
This is equivalent to
^Z1>J^L>1, (8-17)
J211+I J2n+1
but as
■/2»-i _ 2n + 1
JWi 2«
we must have
lim ^ = 1. (8-18)
B-.00 J2»+l
By virtue of Eqns (8-17) and (8-18) it also follows that
hm = 1.
n-+oo J2n+l
So, taking the limit of Eqn (8-16) as n — *■ 00, we arrive at the expression
,. (2 2 4 4 6 2«-2 2« 2« \ , o inx
^"Jr.ll'STs's- • •2^1-27^T'27TTJ' (8>19)
This famous result, called an infinite product, was first obtained by the 16th-
century mathematician John Wallis. If S n denotes the nth partial product
2 2 4 4 2«-2 In 2n
13 3 5 2/i - 1 2« - 1 2« + 1
then the limit in Eqn (8T9) is to be interpreted to mean that | \n — S n | -»•
as n — »- co.
Reduction formulae may involve more than one parameter, as the final
example illustrates.
Example 8- 12 Show that
I m%n = J" sin m x cos" x dx
satisfies the reduction formula
(m + ri)I m ,n = — sin" -1 x . cos n+1 x + (m — l)I m -2, n -
Solution Write I m ,n in the form shown below and integrate by parts.
362 / SYSTEMATIC INTEGRATION CH 8
Im,n = J sin™" 1 x . cos™ x d(— cos x)
= — sin"'- 1 x . cos" +1 x — J (— cos x)[(m — 1) sin™" 2 x . cos" +1 x
—n sin™ x . cos" -1 x]dx
= — sin™" 1 x . cos" +1 x + (m — l)/ OT - 2 , ra +2 — nl m ,„.
Next reduce I m -2, n +2 to a simpler form by writing
I m ~2,n+2 = J sin"*" 2 x . cos ra + 2 x dx = J sin™- 2 x . cos" x(l — sin 2 x)dx
which shows that
'i»-2,»+2 = 'm—2,n — *m,n-
Using this to eliminate I m -2,n+2 from the previous result gives
Im.n = — sin™" 1 X . cos» +1 x + (m — \)I m -2,n — (m — \)I m ,» — nlm.n
or,
(m + n)I m ,n = — sin" 4 - 1 x . cos» +1 x + (m — l)/ m -2,«.
8-5 Integration of rational functions — partial fractions
It will be recalled from Chapter 2 that a rational fraction is a quotient
N{x)jD(x), in which N(x) and Z>(x) are polynomials. Antiderivatives of
rational fractions are often required and in this section we indicate ways of
expressing the fractions as the sum of simpler expressions, the antiderivatives
of which are either known or may be found by standard methods. Our
approach to the general problem of finding the antiderivative
N(x)
J D(x,
. dx
)
will be to first consider some important special cases.
Case (a) Suppose that N(x) is of degree and D(x) is a polynomial of
degree 1 and write
N(x) 1
D(x) ex + d
Then, making the substitution u = ex + d, we find
C dx I r du 1 , , ,
— - - = - — = -log u +C
J ex + d c J u c
and so
I
— = - log \cx + d + C.
ex + d c
SEC 8-5 PARTIAL FRACTIONS / 363
A similar argument establishes that
dx -1 1
I
(ex + d) n c(n — 1) (ex + d) n - x
+ C.
Case (b) Suppose N(x) is of degree and D(x) is of degree 2 and write
N(x) 1
D(x) ax 2 + bx + c
Then completing the square in the denominator D(x) gives
bV
ax 2 + bx + c = a
7 b \ 2 (c b 2 \1 [( b\
+ a
where a = (cja) — (b 2 /4a 2 ) may be positive, negative, or zero. Making the
variable change u = x + (bj2a) then shows that
_ r dx _ i r du
J ax 2 + bx + c a J u 2 +
This is a standard integral which may be identified from Table 8T once the
sign of a has been determined. It will involve either the function arctan or the
function arctanh.
Case (c) Suppose N(x) is of degree 1 and D(x) is of degree 2 and write
N(x) px + q
D(x) ax 2 + bx + c
Then we can write
/= f P x + 1 dx = f (Pl 2a )( 2ax + b) + [g- (pbjla)} ^
J ax 2 + bx + c J ax 2 + bx + c
from which we find
dx
. _ p_ [ 2ax + ft H , ( 2a ° — pb \ r
2a J ax 2 + bx + c \ 2a /J
ax 2 + bx + c
The second antiderivative is the one discussed in (b) above, and by setting
u = ax 2 + bx + c, the first antiderivative reduces to
C lax + b fdw, ,,„,,„,
„ , , , dx = — = log \u \+ C = log \ax 2 + bx + c + C.
J ax 1 ' + i>x + c J u
Combining this result with that of Case (b) then leads to the desired anti-
derivative /.
Case (d) Suppose iV(x) is of degree 1 and D(x) is a quadratic raised to the
364 / SYSTEMATIC INTEGRATION CH 8
power n > 1 and write
N(x) px + q
D(x) (ax 2 + bx + c) n
Then, using the identity
PX + q ^{^a) {2aX + b)+ { q -fa)
enables us to write
f PX + q d»-f^f 2UX + b dx
J (ax 2 + bx + c) n \2aJ J (ax 2 + bx + c)»
2a) J (a
dx
+ \q
(ax 2 + bx + c) n
Setting u = ax 2 + bx + c in the first antiderivative on the right-hand side
then leads to
f 2ax + b cu,f*f-^L)-L + c
J (fl.v 2 + bx + c) n J u n \n - 1 / w"- 1
-1 \ 1
+ C
\n-~\J (ax* + bx + c) n ~ l
The second antiderivative on the right-hand side must be evaluated by means
of a reduction formula.
In the case n = 1 we have the obvious result
/
2ax + b dx = log | ax 2 + bx + c | + C.
ax 2 + bx + c
Having considered a number of special cases we must now examine how
we should proceed when D(x) is any polynomial with real coefficients, and
the degree of the polynomial N(x) is less than that of D(x). The coefficient
ao of the highest power of x in D(x) will be assumed to be unity, since if this
is not the case it can always be made so by division of N(x) and D(x) by ao-
Now we know from Corollary 41 (b) that D(x) may be factorized into real
factors of the form
D(x) = (x - af(x -b) 1 ... (x 2 +px + q) m , (8-22)
where x = a, b, . . ., are real roots with multiplicities k, I, . . ., and (x 2 +
px + q) m represents an w-fold repeated pair of complex conjugate roots.
Then from elementary algebraic considerations it may be shown that when
the degree of N(x) is less than that of D(x) we may always set
N(x) Ay A 2 A* Bx
D(x) ~ (x - a) (x - a) 2 (x - a)* (x - b)
SEC 8-5 PARTIAL FRACTIONS / 365
5 2 , B l Pix +■ Qi
(x-b) 2 (x-b) 1 (x 2 +px + q)
P*+Qz + . . . + PmX+Q™ . (8 . 23)
(x 2 +px + q) 2 (x 2 +px + of
That is to say, every rational fraction may be expressed as a sum of simple
fractions of the types whose antiderivatives were obtained in Cases (a) to (d).
The expression on the right-hand side of Eqn (8-23) is called a partial
fraction expansion of the rational fraction N(x)jD(x) and the coefficients
Ai, A%, . . ., P m , Qm are called undetermined coefficients. The undetermined
coefficients may be found by cross-multiplication of this expression, followed
by equating the coefficients of equal powers of x. Antiderivatives of rational
fractions N(x)jD(x) may thus be found by a combination of the method of
partial fractions and the results of Cases (a) to (d).
If the degree of N (x) exceeds that of D(x) by n, then the situation may be
reduced to the one just described by simply adding to the partial fraction
expansion (8-23) the extra terms
Ro + Rix + R 2 x 2 + • • • + R n x n .
This result can also be achieved by first dividing N(x) by D(x). The circum-
stances usually dictate which approach is the easier.
Example 813 Evaluate
r r (x 3 + 5x 2 + 9x + 5\ A
/ = J I * 2 + 3*+l ) dx -
Solution Here, as the degree of N(x) only exceeds that of D(x) by one, we
shall start by dividing the integrand to get
x 3 + 5x 2 + 9x + 5 , „ 2x + 3
= x + 2 +
x 2 + 3x + 1 x 2 + 3x + 1
when
r r 2x + 3
/ = j (x + 2)dx + j_____ dx .
The first antiderivative is trivial, whilst the second is of the form discussed in
Case (d), so that
v2
/ = — + 2x + log I x 2 + 3x + 1 I + C.
Example 8-14 Evaluate
xdx
I
= Jc
x + 2) 2 (x - 1)
366 / SYSTEMATIC INTEGRATION CH 8
Solution In this case we must adopt the partial fraction expansion
x ABC
(x + 2) 2 (x - 1) x + 2 (x + 2)2 x - 1
Cross-multiplication gives
x = A(x + 2)(x - 1) + B(x - 1) + C(x + 2)2
or
x = A(x z + x - 2) + B(x - 1) + C(x 2 + Ax + A).
Equating coefficients of equal powers of x gives :
Coefficient of x 2 : = A + C
Coefficient of x: 1 = A + B + AC
Coefficient of x°: = -2A - B + AC,
showing that A = -1/9, B = 2/3, and C = 1/9. We may thus write
f xdx - _ I f d;c ? J* dx 1 r dx
J (x + 2)2(x - 1) ~ ~ 9 J x + 2 + 3 J (7+2)" 2 + 9 J x _ f
These antiderivatives were all discussed in Case (a), so that using those results
we obtain
/ = - ^ log | x + 2 | - | ^-- + I log | . - 1 | + C.
Example 8-15 Find the antiderivative
« 4 - x 3 + 5x 2 + x + 3
■■/"
(X + 1)(X 2 - X + I) 2
dx.
Solution Here N(x) = x 4 - x 3 + 5x 2 + x + 3 and D(x) = (x + l)(x 2 -
x + l) 2 , so that the degree of N(x) is 4 and the degree of D(x) is 5. Following
on from our earlier reasoning we must set
x 4 - x 3 + 5x 2 + x + 3 A Bx + C Dx + E
(x + l)(x 2 - X + l) 2 X + 1 X 2 - X + 1 (x 2 - X + l) 2
Cross-multiplication gives the identity
x 4 - x 3 + 5x 2 + x + 3 = ^(x 2 - x + l) 2
+ (Bx + C)(x + l)(x 2 - x + 1) + (Dx + E)(x + 1).
Instead of expanding the right-hand side and then equating coefficients of
equal powers of x as in the previous example, we shall use the fact that
(x + 1) is a factor of D(x) to simplify this expression. Setting x = — 1 we
find that 9 = 9A, or A = 1 and so
SEC s-5 PARTIAL FRACTIONS / 367
x 4 - x 3 + 5x 2 + x + 3 = (x 2 - x + l) 2 + (Bx + C)(x 3 + 1)
+ (Dx + E)(x + 1),
whence
x 3 + 2x 2 + 3x + 2 = (Bx + C)(x 3 + 1) + (Dx + E)(x + 1).
Having eliminated A we now proceed as before and equate coefficients of
equal powers of x to find B, C, D, and E:
Coefficient of x 4 : = B
Coefficient of x 3 : 1 = C
Coefficient of x 2 : 2 = D
Coefficient ofx: 3 = B + E + D
Coefficient of x°: 2 = C + E.
Thus, B = 0, C = 1, D = 2, E = 1 and so
f dx f dx f 2x + 1 „
^J xT l + J^r7Tl + J (x 2 -x + i) 2 dx = /l + /2 + /3 -
f dx
A-J^n-iogix + H + d
i
f dx 2 /2x-l\ „
72 = J (x - i) 2 + (V3/2) 2 = ~3 arCtan l-^-j + C *
Now
and
To evaluate h write
!dx
TT) 2
f__2x- 1 l_ f 2d;
73 ~ J (x 2 -x+l) 2dX + J (x 2 -x
-1 C 2dx
J [(x-
(x 2 - x + 1) J [(x - J) 2 + (V3/2) 2 ] 2
Next, setting x - | = (\/3/2) tan 0, so that dx = (V3/2) sec 2 6 d0, gives
f ^ = fV3sec 2 0d0 = 16V3 f
J Kx - i) 2 + (V3/2) 2 ] 2 J (fsec 2 0) 2 9 J
Using the identity cos 2 6 = |(1 + cos 20) this may be evaluated to give
2 dx 8V3
I
[(x - I) 2 + (V3/2) 2 ] 2 9
8V3
[0 + 1 sin 20] + C 3
/2x-l\ V3 lc-1 1 „
arctan(— ) + .-. ( ^- 7T ^) + C,
368 / SYSTEMATIC INTEGRATION CH 8
Hence we have shown that
-1
/3 =
, 8a/3( (2x- 1\ V3 2x- 1 \
+ -~- arctan + ) + C 3 .
(x 2 - x + 1)
Adding h, h, and h to find / finally gives
T , , , . 14V3 2x - 1 4x - 5
/= log | x + 1 | + — arctan _ + ^— -^ + C.
A factor (x 2 — x + l) 3 in the denominator would have led to J" cos 4 d dd
and so, in general, we would obtain antiderivatives of the form J cos 2 " 6 dd.
8-6 Other special techniques of integration
A great variety of different methods exist for evaluating particular types of
antiderivative, and in this final section we illustrate only a few specially
useful ones with the help of some examples. Extensive tables of integrals are
readily available and, where possible, should be used to minimise tedious
manipulation.
8-6 (a) Substitution t = tan x/2
If we write t = tan x/2 it is easily proved by means of trigonometric identities
that
It 1 - fi
sin x = and cos x = -• (8-24)
1 + t 2 1 + t 2 v
Using these results we can also establish the differential relation
2dt
dx=— • (8-25)
Consequently, in principle, any rational fraction i?(sin x, cos x) that involves
only the sine and cosine functions may be transformed by means of (8-24)
into a rational fraction involving t. On account of this result and (8-25),
it then follows that
= .R(sin x, cos x)dx = \ R -
It 1 - t 2 '
2d?
1 + t 2 '
+ t 2 1 + t 2 _
Thus / has been transformed into an antiderivative of a rational function
involving t.
Example 816 Evaluate
f cos x dx
J 1 + sin x
SEC 8-6 OTHER SPECIAL TECHNIQUES / 369
Solution Transforming to the variable t as indicated above gives
2(1 -
J +
■dt.
m + 1)
It is readily established that
2(1 - i) 2 It
t 2
+ C
1 + t\\ + t) l + t l + t 2
showing that
f 2d* f It J
= log (1 + i) 2 - log (1 + t 2 ) + C.
Thus
whence from (8-24),
/ = log (1 + sin x) + C.
8-6 (b) ' Integration of R[x, \/(ax 2 + bx + c)]
We define R[x, \/(ax 2 + bx + c)] to be a rational fraction involving x and
■\/(ax 2 + bx + c). Special cases of this general type in which b = have
been encountered in Examples 8-2 and 8-5 where it was shown that the sub-
stitutions x = sin u and x = sinh u can be used to reduce the integrand to
one involving only trigonometric or hyperbolic functions. If it is of trigo-
nometric type then the technique of (a) above may be used to reduce the
integrand further to a rational function. If the integrand is of hyperbolic
type then the substitution
t = tanh x/2,
together with
2 1 l+t 2
sinh x = _ g and cosh x = (8-26)
and the differential relation
2dt
dx = Y^T 2 ' (8-27)
will again reduce the integrand to a rational function.
If b #0, then completing the square under the square root sign gives
370 / SYSTEMATIC INTEGRATION CH 8
*~ +b * + *-J'[hh)' +G-&1
The substitution u = x + (b/2a) will then reduce the problem to one of the
two special cases just discussed, according to the signs of a and Kcla) -
(b 2 /4a*)].
Example 817 Evaluate
dx
-/
V(2 -3x- 4x 2 )
Solution First we complete the square under the square root sign to obtain
,_/
V{4[41/64 - (x + 3/8)2]}
Then, setting u = x + f this becomes
T _ 1 C d« . 8k
y_? J V[(41/64)a-«2]- sarCSin V4i
and thus
/= * arCSin lV41-) + C -
8-6 (c) Integration by means of differentiation under integral sign
This approach utilizes the idea of differentiation under the integral sign with
respect to a parameter. It relies on finding a known antiderivative involving a
parameter a, say, with the property that the derivative of its integrand with
respect to this parameter a is capable of being simply related to the integrand
of the desired antiderivative. Specifically, the method uses the result that if
F(x, a) = j"/(x, a)dx
then,
8F(x, a) f 8f(x, a)
-;
dx J 3a
■dx.
Example 8-18 Evaluate by means of differentiation under the integral sign
the antiderivative
J (* 2
dx
3/2
+ a 2 ) :
Solution We first note that the- integrand l/(x 2 + ^2)3/2 j s s j m piy related to
the derivative
SEC 8-6
OTHER SPECIAL TECHNIQUES / 371
8
8a
_(x 2 + a 2 ) 1/2 _
Accordingly, let us consider the familiar antiderivative
J
dx . , x _
— — — = arcsinh - + C.
(x 2 + a 2 ) 1/2 a
dx
Then
8
8a J (x 2 + a 2 ) 1 ' 2
and so
/
8
8a
arcsinh — \- C
a
f 2a dx _ !x\ 1
~* J ( X 2 + fl 2)3/ 2 - ~ ^j ((x/a) 2 + 1) 1/2
or,
/
dx
(x 2 + « 2 ) 3/2 a 2 (x 2 + a 2 ) 1 ' 2
+ C.
The arbitrary constant C" has been added since we are deducing an anti-
derivative and not just an indefinite integral.
8-6 (d) Integration of trigonometric functions involving multiple angles
Antiderivatives of products of trigonometric functions involving multiple
angles are of considerable importance and the most frequently occurring
ones are :
h = J sin mx cos nx dx, (8-28)
h = J" sin mx sin nx dx, (8-29)
h = $ cos mx cos nx dx. (8-30)
These are easily evaluated by appeal to the trigonometric identities :
sin mx cos nx = |[sin (m + n)x + sin (m — n)x], (8-31)
sin mx sin nx = |[cos (m — n)x — cos (m + n)x], (8-32)
cos mx cos nx = M cos ( m + «)* + cos (m — «)x]. (8-33)
Substitution of these identities into the above antiderivatives produces :
1
2
cos (m — n)x cos (m + n)x"
h =
(m — ri)
(m + ri)
+ C for m 2 ^ n 2
1
(8-34)
— - — cos 2wx + C for m = n,
4m
372 / SYSTEMATIC INTEGRATION
CH 8
h =
"sin (m — ri)x sin (m + n)x
(m — ri)
(m + ri) _
+ C for m 2 ^ n 2
■ (mx — sin mx cos mx) + C for m = n,
"sin (m — «)x sin (m + «)x"
/3 =
(m — ri)
m + n
+ C for m 2 ^ m 2
— (m + sin mx cos mi) + C for w = «.
1 2m
(8-35)
(8-36)
Example 8-19 Evaluate the following two antiderivatives :
h = J sin 3x cos 5x d.v, h = $ sin 2 3x dx.
Solution The antiderivatives follow immediately by substitution in (8-34)
and (8-35):
cos 2x cos 8* „ x sin 3x cos 3x
/r = - j^ + C, /,-- + C.
PROBLEMS
Section 81
8-1 Find the following antiderivatives:
(a)
(d)
[4^; (e) ficos4^dx; (f) f 3* d*.
8-2 Verify by means of differentiation that
dx
f
= log | x + V(x 2 - a 2 ) I + C
VO 2 - a 2 )
Compare this form of result with that shown against entry 10 of Table 8.1.
8-3 Verify by means of differentiation that
I
dx
a 2 — b 2 x z lab
= ^t lo S
a + bx
a — bx
+ C.
Compare this more general result with those shown against entries 11 and 12
of Table 8.1.
8-4 Verify by means" of differentiation that
[
Compare this form of result with that shown against entry 9 of Table 8-1.
PROBLEMS / 373
8-5 Use the result of Theorem 81 to verify the following general results:
(a) f£dx = log|/| + C;
(b) r /<»+d gdx =gf (n) - g'f <n ~ l) +g"f< n - 2) + ■ ■ ■
+ (-l)V n, /+ (-l) n+1 . Sg< n+1) fdx;
"Hffl'-i
= ^ + C;
(d) U£l — £Z\dx = iog
+ c.
8-6 Apply the results of Problem 8-5 together with some slight manipulation to
determine the following antiderivatives :
, . f/2x sin x — x 2 cos x\ ,
(a) —„ Ax;
' ' sin 2 x '
J(
(b) j{ J?2*Z*ll ) dx;
(c) Jx 2 e*/ 2 dx;
, f/x sinhx — 3 cosh x\
J \ x cosh x J
8-7 Evaluate the following antiderivatives:
(a) J (x 2 + 3 sin x + l)dx; (b) J (4* + 2 cos 2x)dx;
(c) J (4 sinh x + sin x)dx; (d) J (e M + 3)d.v.
8-8 Use the following identities to evaluate the four antiderivatives listed below:
sinh mx cosh mx = \ [sinh (m + n)x + sinh (m — n)x]
sinh mx sinh nx = J[cosh (/n 4- n)x — cosh (m — n)x]
cosh rax cosh nx = A [cosh (m + n)x + cosh (m — n)x]
(a) J sinh 4x cosh 2x dx; (b) J sinh x sinh 3x dx;
(c) J cosh 4x cosh 2x dx; (d) J cosh 2 2x dx.
Section 8-2
Use the indicated substitutions to evaluate the following antiderivatives.
8-9 f d * .. , x = l/ii.
J xv(x 2 — 4)
810 J V(l - x 2 )dx, a: = sin u.
« -.-. C tanh x dx , .
811 TT, — Z rC> cosh x=l + u 2 .
J 2 V(cosh x — 1)
8-12 J cos *\/sin x dx, sin x = u.
8-13 J x(3x 2 + l) 5 dx, 3x 2 + 1 = u.
374 / SYSTEMATIC INTEGRATION CH 8
8 - 14 /v§TT)' ^+0-..
Evaluate the following antiderivatives by means of a suitable trigonometric
substitution.
J V(l - * a )
f V(x* +
816 I vv ~ ' J) dx.
817 I -=^- -dx.
Evaluate the following definite integrals.
8-18 f (3x + 1) sinh (x 3 + x + 3)dx.
8-19 f *V(1 + * a )d*-
f V(* -
8-20 f a/(* - 2)dx.
Section 8-3
Evaluate the following antiderivatives using the technique of integration by parts.
8-22 J Q ax sin x dx.
8-23 J xe ax dx.
8-24
f xdx
J sin 2 x
8-25 J sin x sinh x dx. \
8-26 J 7* cos x dx.
8-27 J log 2 x dx.
8-28 J x arcsin x dx.
Secjion 8-4
^_JB49 Given that /„ = J (1 — x 3 )" dx, where n is an integer, show that
(3n + l)/„ = x(l - x 3 )« + 3« / fl -i.
Hence prove that
(1 _ *3)5 = 3 6/ 2 4 .7.13.
/:
PROBLEMS / 375
8-30 The integral I m is denned by
lm —
Show that
x
— r — — -j dx for integral m >. 0.
(x 2 + l) m+3 6
_ m + 2
Jm-l — 7 lm,
m — I
and by using the substitution x = tan prove that
1
sin 7 cos 5 6 d0 = — •
o
8-31 Show that for integral n > 1,
x" sin x dx = n \ x n ~ x cos x dx
Jo Jo
and
J' Jit /*i>7
x" cos x dx = (iw)» — n x" -1 sin x dx
o Jo
Use the result to evaluate
x 2 cos x dx.
Jo
8-32 The function I P , q is defined by
ha = J xp (log x)« dx
in which />,^ are positive integers. Show that
(p + X)I p , q + q I p , v -i = x*> +1 (log x)«.
8-33 If
T M = J tan" d0,
where n # 1 is a positive integer, show that
tan"- 1
Jra = z Tn-2.
n — 1
Use this result to evaluate
tan 6 d0.
8-34 The function I m ,n is defined by
Im.n = J x m (a + bx)» dx,
in which m,n are positive integers. Prove that
b{m + n + l)I m ,n + ma 7 m -i,n = x m (u + bx) n+l .
8-35 The function I m , n is defined by
I m<n = J S in m cos" d0,
in which m,n are positive integers. Show that l m ,n satisfies the reduction
formula
376 / SYSTEMATIC INTEGRATION CH 8
(m + n)I m ,n — (n — l)/ m ,n-2 = sin m+1 x cos"- 1 x.
Section 8-5
Evaluate the following antiderivatives by means of partial fractions.
8-36 f ^
J (x - l)(x + 2){x + 3)
8 . 37 r *;-f + v
J x 2 - 5x + 6
838 f 3 f >4 . •
J x 6 — 2x i + x
8-41 f ^
J x 3 - 4x 2 + 5x - 2
C x 2 + 2
8 -43j ( , + 1) ; x _ 2) d-
f- *4 + 4*3+lb; 2 +12*+8
J (x 2 + 2x + 3) 2 (x + 1)
Section 8-6
Evaluate the following antiderivatives by means of the substitution t — tan x/2.
8 . 45 r ** —
J 3 + 5 cos x
g-46 l" *!
J sin x + cos x
8 . 47 f . . dx „
J 8 — 4 sin x + 7 cos x
„ ,„ f sin x
848 -dx.
J (1 — cos x) 3
Evaluate the following antiderivatives by means of one or more suitable sub-
stitutions.
r dx
J V(2 + 3x -
849 , ,_ „ ^
PROBLEMS / 377
3x- 6
dx.
V(x 2 - 4x + 5)
dx
xV(.l - * 2 )
8-52 J v(* 2 + 2x + 5) dx
8-53 J (X-1W&-
2)
8-54 J V(* - x 2 ) dx.
Use the technique of differentiation under the integral sign to evaluate the
following antiderivatives.
8-55 f xe ax dx.
8-56 / (*» +* a y (Hint: start from / ^TI 2 " in Table 8 ' L)
8-57 / (a 2 -^)3/2 - (Hint : Start from / v^^- * 2 ) '" TaWe 81 - }
8-58 J xa x dx. (Hint : Start from J a x dx in Table 8- 1 .)
Evaluate the following trigonometric antiderivatives.
8-59 J cos x cos 2x dx.
8-60 J sin ax sin (ox + e)dx, a, e non-zero constants.
8-61 j cos x cos 2 3x dx.
8-62 J sin x sin 2x sin 3x dx.
Use the results of this chapter together with Definitions 7-4 and 7-5 of Chapter 7
to classify the following improper integrals as convergent or divergent. Determine
the value of all improper integrals that are convergent stating any conditions that
must be imposed to ensure this.
.65 r dx
J (1 + x)Vx
/* 00
66 j cos x *
/*oo
•68 e^i
8-66 I cos x dx.
8-68 e^dx.
Linear transformations
and matrices
9-1 Introductory ideas
This chapter is concerned with the branch of mathematics known as linear
algebra. One aspect of this subject has already been encountered, namely
vectors, and it is now necessary to develop in a more general context various
of the ideas that were first introduced there. Central to the entire subject is
the fundamental idea that the algebraic operations of addition, subtraction,
and multiplication can be made meaningful when applied to an array of
numbers or functions considered as a single entity.
An example will help here to indicate one of the many different ways in
which such an array may arise, and at the same time to show something of
the type of algebra it is reasonable to want to perform on an array. Three
chemical plants numbered 1 to 3 each have separate sources of raw material
from which each one produces the same four products numbered 1 to 4. Let
plant number m produce product number n at a cost a mn units per ton, then
the production costs of the complex of chemical plants is conveniently
characterized by the following table of the twelve quantities a mn .
Table 91
Product
Plant
In writing this table or array of quantities a mn we have used the convention
that the first of the two suffixes attached to the quantity a mn refers to the row
number in which a mn appears, and the second to the column number. Thus
the entry aii occurs in row 2, column 3, whilst the entry az% occurs in row 3,
column 2. The important use of suffixes in this way is strictly analogous to a
map reference in which the first entry is a latitude and the second a longitude.
Thus the double suffix notation used here serves to identify the position in
the array to which the associated quantity is assigned.
On account of the use to which the suffixes have been put, we can now
dispense with the extreme left-hand column and the top row of Table 9T,
which only serve for identification purposes, and write instead
1
2
3
4
1
an
an.
ai3
an
2
ai\
azi
023
024
3
031
tf32
033
<?34
SEC 9-1
INTRODUCTORY IDEAS / 379
a\\ fli2 ai3 ai4
#21 022 023 ^24
031 O32 O33 O34
(9-1)
with the understanding that the symbol A represents the array of quantities
originally contained in Table 9-1.
Returning now to the physical situation from which the array (9-1) was
derived, let us suppose that at some time the quality of the raw materials
changes, so that a revised Table 9T then applies in which entry a mn is replaced
by the new entry b mn . Then, in terms of our concise notation, we can
characterize this new situation by defining an array B as follows:
B
b\\ bi% biz bu
bzi ba 623 624
.631 £32 £33 634.
(9-2)
In terms of the information at our disposal, we know that the change in
the cost of product n from chemical plant m is a mn — b mn , whilst the average
cost of product n from plant m is \{a mn + b mn ). Hence, if C is the array of
change in costs of products and D is the array of the average costs of
products, in our new notation we may write:
C =
on — £11 012 — bl2
021 — 621 022 — 622
O31 — 631 032 — 632
O13 — 6l3 O14 — bu
023 — ^23 024 — &24
033 — ^33 O34 — bsi
(9-3)
and
D =
|(Oll + 611) |(0l2 + 612) K«13 + 6l3) |(«14 + 614)'
J(«21 + *2l) £("22 + 622) |("23 + £>23) K fl 24 + b 2i )
_i(a31 + &3l) K«32 + 632) K«33 + 633) K«34 + 634).
(9-4)
The form of these results is suggestive, for it would seem that by defining
subtraction of two similar arrays to mean the array formed by the subtraction
of corresponding elements, we may write
C = A - B.
(9-5)
Similarly, if addition of two similar arrays is taken to mean the array
formed by the addition of corresponding entries, and the multiplication of an
array by a factor is taken to mean the array formed by the multiplication of
each entry by that factor, we may write
D = i(A + B).
(9-6)
380 / LINEAR TRANSFORMATIONS AND MATRICES CH 9
Hence, in a natural manner, we are starting to perform what appears to
be conventional algebraic operations on an entire array of numbers, rather
than on the individual entries in the arrays themselves. In mathematical
terms an array of the form shown on the right-hand side of Eqn (9-1) is called
a matrix of order (3 X 4). Here, analogous to the double suffix notation
already introduced, the first number is taken to refer to the total number of
rows in the matrix and the second number to refer to the total number of
columns in the matrix.
In terms of the simple physical situation used to introduce the notion of a
matrix and its associated algebra we have so far given no indication of the
interpretation to be placed upon multiplication. To elucidate the form taken
by this operation when applied to matrices, we again return to our physical
situation and consider the cost of buying ci, ct, cz, and a tons, respectively,
of products 1, 2, 3, and 4 from each of the three chemical plants in turn. If
the product costs are as shown in Table 9-1, and the costs of the orders are
denoted by d\, d 2 , and d%, it is readily seen that
d\ = CL\\C\ + ai2C2 + A13C3 + «14C4
d 2 = azici + a 22 c 2 + 023C3 + a 2i a (9-7)
da = aaici + az 2 c 2 + 03303 + 03404.
In terms of the matrix A in Eqn (9-1), the right-hand side of the first
equation in (9-7) is obtained by multiplying successive entries in the first row
of A by a, c 2 , cz, and c\, respectively, and then adding the four products.
The same process will generate the right-hand side of both the second and
third equation in (9-7), provided that the entries in the second and third rows
of matrix A are used in place of those in the first row. If the four numbers
ci, C2, C3, and a are arranged in a column which is then regarded as a (4 x 1)
matrix, the basic operation of matrix multiplication is seen to be the multi-
plication of a row of the first matrix into the column of the second to yield a
single number. Thus, in terms of the first row of A expressed as a (1 x 4)
matrix, we have the definition
flnci + ai2C2 + a\aca + ana = [an a\ 2 #13 014]
where juxtaposition is used to imply multiplication of the row and column
matrices on the right-hand side.
Similarly, in terms of the second row of A expressed as a (1 x 4) matrix,
our definition yields
SEC 9-1
INTRODUCTORY IDEAS / 381
fl2lCl + A22C2 + C123C3 + 024^4 = [021 «22 023 O24]
and a corresponding result is also true for the third row of A when expressed
as a (1 X 4) matrix. This special form of product is called either the inner
product or the scalar product of a row matrix and a column matrix.
Collectively these results suggest that we should write Eqns (9-7) in the
matrix form
an 012 ai3 an
021 022 023 «24
031 032 033 034
(9-8)
with the understanding, as before, that multiplication is implied by juxta-
position and means the inner product of rows of the first matrix with the
column of the second matrix. To be consistent, equality of two matrices must
then be taken to mean the equality of corresponding entries in two matrices
of similar order. Using this convention our suffix notation works for us in
the sense that the row number and the column number, taken in that order,
which are involved in an inner product are the row and column numbers of
the location into which that product is to be put. Thus in matrix equation
(9-8), the number 0*2 is in row 2, column 1 of the left-hand column matrix,
and it is the result of forming the inner product of row 2 of the first matrix
on the right-hand side with column 1 of the second matrix. (The second
matrix here only has one column.)
If the column matrix with entries d\, 0*2, 03 is denoted by D, and the column
matrix with entries a, C2, C3, and C4 is denoted by C, then Eqn (9-8) can be
reduced to the deceptively simple equation
D = AC.
(9-9)
It should be noticed that the resemblance to the algebra of real numbers ends
here, because although multiplication is a commutative operation for real
numbers, it is an easy task for the reader to verify that the matrix product
CA is not even defined for the matrices involved here. Later we shall see that
the non-commutative character of matrix multiplication is not the only
difference between the field of real numbers and matrices. The result of matrix
multiplication using numbers is illustrated in the following example:
382 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
'1 2 1 01
" 2"
r 4 i
113
1
=
-2
1 2 1 4J
_-l_
^ 0_
We remark in passing that the name scalar product of a row matrix and
a column matrix derives from a comparison with the scalar product of two
vectors. Namely, if a = aii + a 2 j + a 3 k, p = fti + /?2J + fck are two
vectors, then a . p = ai/?i + a2/?2 + «3/?3, which is just the result of forming
the inner product of a row matrix with entries oci, 012, 1x3 and a column matrix
with entries /Si, /S2, /S3. Because of this similarity it is customary to refer to
matrices comprising only one row or one column as row vectors or column
vectors, respectively. Thus a general (1 x n) row vector may be considered as
a matrix representation of an ordinary form of vector having n components,
and which belongs to an n-dimensional space.
This simple idea proves to be very fruitful in more advanced accounts of
linear algebra where it leads to the study of what are called w-dimensional
vector spaces. These spaces have properties very similar to those discussed in
Chapter 4 and, as in three dimensions, the scalar product is related to the
geometrical operation of projection in the space. In an w-dimensional vector
space a fundamental set of row or column vectors called a basis takes the
place of the unit vectors i, j, and k and lead to the important idea of linear
independence which will be examined later.
Because of the shape of the array, a general (m x ri) matrix is called a
rectangular matrix. The rule just devised for the product of a (3 x 4) matrix
and a (4 x 1) column vector also applies to the product AB of two rectangular
matrices A and B, provided only that the number of columns in A is equal to
the number of rows of B. This last requirement follows directly from the
concept of an inner product which is only denned when the number of entries
in a row of A is equal to the number of entries in a column of B. Once again
the suffix notation works for us, because the inner product of row/> of matrix
A and column q of matrix B is the number c pa , which is found in row p and
column q of the product matrix C = AB. Consider the following example
which illustrates the application of this rule :
12 10"
113
12 14
["2 1"
r 4
71
1 2
2
=
-2
7
L-i 1.
1 1 J
Then, for example, the entry in row 3, column 2 of the product matrix is the
number 11, which is the inner product of row 3 of the first matrix involved in
the product and column 2 of the second matrix involved in the product.
SEC 9-1
INTRODUCTORY IDEAS / 383
Notice that the rule for forming an inner product also determines the
shape of the product matrix C = AB, for C must have as many rows as A
and as many columns as B. (Think about this and check it.) In fact these
arguments may be formulated into a useful short-hand rule for checking that
two matrices are conformable for multiplication, and at the same time
displaying the shape of the product matrix.
Rule 1 (Multiplication conformability rule)
If A is an (m x n) matrix and B is a (p x q) matrix, then the matrix product
AB may be formed provided n = p. The resultant product matrix then has
the form (m X q). Symbolically we write this
(w X n)(p X q) = (m X q) only if n = p.
Thus matrix products of the form (3 X 7)(7 X 2) are conformable for
multiplication and yield a (3 x 2) matrix. Matrix products of the form
(7 X 3)(5 X 4) are not defined and certainly do not yield a (7 X 4) matrix.
This rule has various important implications, and at this stage in our
argument we would draw attention to the fact that even when for two matrices
A and B, both the matrix products AB and BA are defined, they are not
usually equal. Indeed, the order of the two product matrices may be different,
as the following example shows. If
A =
"1
2"
-1
, B =
.4
1_
12 1"
-1 1
then
AB =
1
4 r
1
-1
3
9 4
and
BA =
' 5
-1
A different but most important way in which matrices can arise is in
dealing with sets of simultaneous equations. Consider the following set of
simultaneous equations :
x + y + 2z = 4
2x — y + 3z = 9
3x — v — 2 = 2.
These equations may be written in matrix form by introducing a column
vector with entries x, y, z and then using the rule of matrix multiplication to
write
384 / LINEAR TRANSFORMATIONS AND MATRICES CH 9
'1
1
2"
~x~
"4"
2
-1
3
y
=
9
3
-1
-1
_z_
_2_
With only a little practice, the reader will quickly learn to transcribe systems
of equations into matrix form, for the patterns of numbers involved in the
two numerical matrices are identical to the patterns of numbers in the
equations themselves.
For obvious reasons the (3 X 3) matrix is called the coefficient matrix of
the simultaneous equations. As in this case there are three equations and
three unknowns, the coefficient matrix is square in shape. In general the name
square matrix will be given to any (n x n) matrix. If the coefficient matrix
above is denoted by A, and the column vectors with entries x, y, z and 4, 9, 2
are denoted, respectively, by X and K, we arrive at the matrix equation
AX = K.
There is a great temptation to attempt to solve this for X by dividing by
A, but as it is meaningless to divide two arrays of numbers this approach
must be abandoned. Later we will return to this matter and resolve the diffi-
culty by introducing the concept of the inverse of a square matrix via the
operation of multiplication.
One final and important way in which matrices may arise is in connection
with what are called linear transformations. The idea involved here is perhaps
best understood if described in terms of coordinate transformations, and for
this purpose we now confine attention to a special change of coordinates in
a plane.
Suppose a set of rectangular cartesian axes 0{x', /} in a plane is derived
from a set of rectangular cartesian axes 0{x, y} by rotation about O through
an angle 6. Then under this process a point P in the (x, j)-plane with co-
ordinates (f , rj) appears as a point with coordinates (£', rj') in the (x', /)-
plane, as shown in Fig. 9-1.
Simple geometrical considerations show that
f ' = £ cos 6 — rj sin 6
rj' = | sin 6 + r\ cos 6.
Now this result is true for any point P in the (x, j)-plane and its map in the
(x', /)-plane, so that with complete generality we may display the effect of
this coordinate transformation by writing
x' = x cos 6 — y sin 6
y' = x sin 6 + y cos 6. (9- 10)
If the axes 0{x', y'} and 0{x, y} are thought of as belonging to two differ-
ent but superimposed planes with a common origin, then Eqns (9- 10) may
SEC 9-1
INTRODUCTORY IDEAS / 385
Fig. 9-1 Rotation in a plane.
be regarded as describing the relationship between points in each plane
when corresponding axes are inclined at an angle d. In this respect the
transformation described by Eqns (9-10) can be regarded as a function or
mapping, in the sense of Chapter 2, of the set of points comprising the
(x, j)-plane into the set of points comprising the (x 1 , y)-plane. The mapping
is obviously one to one, and both the domain and range of the mapping is
the set of points comprising the plane itself. In matrix notation the relationship
becomes
x
L/.
cos d —sin i
sin 6 cos I
(9-11)
Hence by pursuing the simple idea of the geometrical operation of the
rotation of a plane about the origin we have arrived at the matrix
Ra =
cos a —sin I
sin 6 cos i
(9-12)
The idea involved here is a much more general one than that involved in
simultaneous equations, since R„ contains a complete description of how an
entire plane transforms or maps, together with whatever specific curves of
interest it may contain. In addition to this we have also produced an example
of a matrix whose entries, or elements as we shall call them henceforth, are
functions of a single real variable.
Accordingly, it is reasonable to ask whether any meaning can be given to
the entity dR 9 /d0, where R 9 is a matrix whose elements are functions of the
real variable d. This is not an abstract matter, for in mechanics and many
other subjects it is frequently convenient to work with axes that are fixed in
a rotating body. Indeed, the same sort of idea was implicit in the example
386 / LINEAR TRANSFORMATIONS AND MATRICES CH 9
first used to introduce matrices. In that case by regarding the quality of the
raw material as a function of the time t, we arrive at a (3 x 4) matrix A(?)
whose elements a mn (t) are functions of time and any attempt to examine
rates of change involves considering the meaning of dA/dt.
The term linear transformation in relation to the rotation transformation
(9-11) comes about as follows. Consider the effect of a rotation on the two
points (a, /?) and (y, d) which map into the points (a', /?') and (/, d'),
respectively. Then from Eqns (9-10) we have
a' = a cos — (j sin y' = y cos — d sin
. „ and
j3' = a sin + ft cos 6 d' = y sin + d cos 0,
whence
a' + y' = (a + y) cos - (0 + <5) sin
0' + 0' = (a + y) sin + (/3 + i) cos 0.
So, setting
X =
a y
X* =
J] k
we have in fact shown that
R 8 X + R 9 X* = R,(X + X*), (9-13)
which asserts that multiplication by R 9 is distributive with respect to addition.
It is the general property described by Eqn (9-13) that is used to characterize a
linear transformation, and it is on account of this that R fl X is called a linear
transformation of the vector X. In fact matrix multiplication is always
distributive with respect to addition, as we shall see later.
Thus far in our introductory presentation of matrices only intuitive argu-
ments have been used. This approach has been adopted deliberately in an
attempt to emphasize that matrices arise naturally, and that an obvious
algebra suggests itself for their manipulation. To proceed further it now
becomes necessary to formalize these ideas in exact mathematical terms, and
then to develop them in systematic form to the point at which they can be
used as a useful tool.
9-2 Matrix algebra
In this section we return to the fundamental ideas connected with matrices
and their algebra which were outlined on an intuitive basis in Section 9T.
This time, however, our discussion will be more formal and, relying on our
introductory account to provide motivation, we shall proceed quickly
through the basic definitions and theorems, which will be illustrated by
example. The problem of the solution of systems of linear equations and a
discussion of linear transformations and some of their applications will be
SEC 9-2
MATRIX ALGEBRA / 387
presented in subsequent sections.
definition 9-1 (matrix and its order) A matrix is a rectangular array of
elements ay involving m rows and n columns. The first suffix / in element ay is
called the row index of the element and the second suffix y is called the column
index of the element. These indices specify the row number and column
number in which the element is located, with row 1 occurring at the top of
the array and column 1 at the extreme left. A matrix with m rows and n
columns is said to be of order mby n and this is written (m x «). The order
describes the shape of the matrix.
Special names are given to certain types of matrix and we now describe
and give examples of some of the more frequently used terms.
(a) A row matrix or row vector is any matrix of order (1 X n). The
following is an example of a row vector of order (1 X 4) :
[3 7 2].
(b) A column matrix or column vector is any matrix of order (n x 1). The
following is an example of a column vector of order (3 x 1):
11
(c) A square matrix is any matrix of order (« X n). The following is an
example of a square matrix of order (3 x 3) :
"1 2
4"
3
2
5 1
3_
Three particular cases of square matrices that are worthy of note are the
diagonal matrix, the symmetric matrix and the skew-symmetric matrix. Of
these, the diagonal matrix has non-zero elements only on what is called the
principal diagonal, which runs from the top left of the matrix to the bottom
right. The principal diagonal is also often referred to as the leading diagonal.
The following is an example of a diagonal matrix of order (4x4):
3
0"
2
5.
The diagonal matrix in which every element of the principal diagonal is a
388 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
unity is called either the unit matrix or the identity matrix, and it is usually
denoted by I. The unit matrix of order (3 x 3) thus has the form
1 =
"1
-o"
1
.0
1_
A symmetric matrix is one in which the elements obey the rule at) = qju
so that the pattern of numbers has a reflection symmetry about the principal
diagonal. A typical symmetric matrix of order (3 X 3) is:
2
-2
A skew-symmetric matrix is one in which the elements obey the rule
ay = —a n , so that the principal diagonal must contain zeros, whilst the
pattern of numbers has a reflection symmetry about the principal diagonal
but with a reversal of sign. A typical skew-symmetric matrix of order (3 x 3)
is:
1
5"
-1
-3
-5 3
0_
(d) A null matrix is the name given to a matrix of any order which con-
tains only zero elements. It is usually denoted by the symbol 0. The null
matrix of order (2 x 3) has the form
ro o oi
=
[o OJ
definition 9-2 (equality of matrices) Two matrices A and B with general
elements ay and by, respectively, are equal only when they are both of the
same order and ay = 6 y for all possible pairs of indices (i,j).
Example 91 Is it possible for the following pair of matrices to be equal
and, if so, for what value of a does equality occur :
5
fl3"
and
"5
-27
a a
1
9
1
Solution The matrices are both of the same order and hence they will be
equal when their corresponding elements are equal. As corresponding ele-
SEC 9-2
MATRIX ALGEBRA / 389
ments on the principal diagonal are indeed equal, we need only confine atten-
tion to the off-diagonal elements. Thus the matrices will be equal if there is a
common solution to the two equations a 2 = 9 and a 3 = —27. Obviously,
equality will occur if a = —3.
definition 9-3 (addition of matrices) Two matrices A and B with general
elements ay and fey, respectively, will be said to be conformable for addition
only if they are both of the same order. Their sum C = A + B is the matrix
C with elements cy = ay + fey.
As addition of real numbers is commutative we have ay + fey = fey + ay.
This shows that addition of conformable matrices must also be commutative,
whence
A + B = B + A.
(9-14)
Now addition of real numbers is also associative so that (ay + fey) +
cy = ay + (fey + cy). Hence if ay, fey, and cy are general elements of
matrices A, B, and C which are conformable for addition, then this also
implies that addition of matrices is associative, whence
(A + B) + C = A + (B + C).
Results (9T4) and (9-15) comprise our first theorem.
(9-15)
theorem 91 (matrix addition is both commutative and associative) If
A, B, and C are matrices which are conformable for addition, then
(a) A + B = B + A (Matrix Addition is Commutative) ;
(b) (A + B) + C = A + (B + C) (Matrix Addition is Associative).
Example 9-2 Determine the constants a, b, c, and d in order that the
following matrix equation should be valid :
'0 a 3"
fe 2 2
+
"4 3 5"
7 3 5
Solution Adding the two matrices on the left-hand side we arrive at the
matrix equation
c (a + 1) 5
.(fe +1) 3 (d + 2).
3 5"
3 5
Equating corresponding elements shows that a = 2, fe = 6, c = 4, and
d=3.
definition 9-4 (multiplication by scalar) If k is a scalar and the matrix
390 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
A has elements ay, then the matrix B = kA is the same order as A and has
elements kay.
Example 9-3 Determine 2A + 5B, given that:
A =
"1 2"
.3 4_
and
B
1 3"
4 2_
ution
2A +
5B = 2
'1 2"
.3 4_
+ 5
"-1
4
3"
2_
or,
whence
2A + 5B =
2A + 5B =
"2 4"
+
"-5 15"
J> 8.
. 20 10_
-3 19"
26 18
definition 9-5 (difference of two matrices) If the matrices A and B are
both of the same order, then their difference A — B is defined by the relation
A-B = A + (-l)B.
Example 9-4 Determine A — B, given that:
A =
Solution
"1
3"
4
-2
_1
6_
and
B =
'4 2"
3 1
-2
B
"1
3"
"4
T
4
-2
+ (-1)
3
1
.1
6_
-2_
and so
B =
"1
3"
4
-2
+
1
6_
4
-2"
3
-1
=
2_
-3 1"
1 -3
1 8
definition 9-6 (matrix multiplication) The two matrices A and B with
SEC 9-2
MATRIX ALGEBRA / 391
general elements ay and 6y are said to be conformable for matrix multiplica-
tion provided that the number of columns in A equals the number of rows in
B. If A is of order (m x «) and B is of order (n X r), then the matrix product
AB is the matrix C of order (m x r) with elements cy, where
ct] = anbij + aabzj + • • • + Qtnbnj-
The number cy is called the inner product of the ith row of A with the yth
column of B.
Example 9-5 Determine A + BC, given that :
A =
"1 4"
B =
"1 4 2"
|_2 3J
[.2 1 lj
and
C =
'3 4
1
2
Solution Matrix B is of order (2 x 3) and matrix C is of order (3 x 2),
showing that BC are conformable for multiplication. We have
BC =
"3 4"
"1 4 2"
"7 8"
1
=
.2 1 1.
7 10
.0 2_
and so
A + BC =
'1 4"
"7 81
"8 12
+
=
2 3.
7 IOJ
9 13
On account of the fact that matrix multiplication is not normally com-
mutative, it is important to use a terminology that distinguishes between
matrix multipliers that appear on the left or the right in a matrix product.
This is achieved by adopting the convention that when matrix B is multiplied
by matrix A from the left to form the product AB, we shall say that B ispre-
multiplied by A. Conversely, when the matrix B is multiplied by A from the
right to form the product BA, we shall say that B is post-multiplied by A.
The most important results concerning matrix multiplication are con-
tained in the following theorem, which asserts that matrix multiplication is
distributive with respect to addition and that it is also associative.
theorem 9-2 (matrix multiplication is distributive and associative) If
matrices A, B, and C are conformable for multiplication, then :
(a) matrix multiplication is distributive with respect to addition, so that
A(B + C) = AB + AC;
392 / LINEAR TRANSFORMATIONS AND MATRICES CH 9
(b) matrix multiplication is associative, so that
A(BC) = (AB)C.
Proof To establish result (a) let B and C be of order (m x «), and denote
their general elements by b t j and cy, respectively, so that the general element
of B + C is b tl + c tl . Then if A is of order (r x m) with general element ay,
and d v , is the general element of D = A(B + C) which is of order (r x «),
we have from Definition 9-6 that
dij = an(bij + cy) + a^by + c 2 /) + • • • + a tm (b m j + c m] ).
Performing the indicated multiplications and re-grouping we have
dy = (aabij + atzbz] + • • • + aimbm)) + (aacij + awcy +
" • • + atmCmj).
However, from Definition 9-6 this is seen to be equivalent to
D = AB + AC,
which was to be proved.
Result (b) may be established in similar fashion, and to achieve this we
assume A, B, and C to be respectively of order (p X q), (q X m), and (m X n)
with general elements ay, by, and cy.
From Definition 9-6 we know that the general element occurring in row i,
column j of the product BC has the form
bilCij + bi2C2} + - • ■ + bimCmj,
so that the general element dy occurring in row / column j of the product
D = A(BC) which is of order (p X ri) must have the form
dy = aa(biiaj + bncij +•••-)- b\ m c m j)
+ awdbzicij + biicij + • • • + b?, m c m j)
+
+ atqibqlCl] + b Q 2C2j + • • • + bqmCmj).
Re-grouping of the terms then gives
dy = (aabn + atzHx + • • • + atqb q i)cij
+ (anbi2 + a«2&22 4- • • • + a tg b g 2)c2}
+
+ (anbim + aizbzm + • ■ ■ + ai g b gm )c m j.
Appealing once more to Definition 9-6 we find that this is equivalent to
D = (AB)C,
which was to be proved.
SEC 9-2
MATRIX ALGEBRA / 393
Example 9-6 If
A = [1 2], B =
verify that
(a) A(B + C) = AB + AC,
(b) A(BC) = (AB)C.
Solution
(a) We have
'3 4"
2 3
" 1 3"
, c =
"2 r
L-1 2J
L3 lj
B + C =
so that
A(B + C) = [1 2]
'3 4"
2 3
= [7 10];
whereas
AB = [-1 7] and AC = [8 3],
so that
AB + AC = [7 10].
(b) We have
BC =
so that
A(BC) = [1 2]
whereas
AB = [1 2]
whence
(AB)C=[-1 7]
" 1 3"
"2 r
"11 4"
.-1 2_
.3 1.
. 4 1.
11 4"
4 1
= [19 6];
1 3"
-1 2
= [-1 7],
"2 r
3 1
= [19 6].
394 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
An important matrix operation involves the interchange of rows and
columns of a matrix, thereby changing a matrix of order (m x ri) into one of
order (n x rri). Thus a row vector is changed into a column vector and a
matrix of order (3 X 2) is changed into a matrix of order (2 x 3). This
operation is called the operation of transposition and is denoted by the
addition of a prime to the matrix in question.
definition 9-7 (transposition operation) If A is a matrix of order (m X ri),
then its transpose A' is the matrix of order (n x rri) which is derived from A
by the interchange of rows and columns. Symbolically, if ay is the element in
the /th row andy'th column of A, then aji is the element in the corresponding
position in A'.
Example 9-7 Find A' and (A')', given that:
"1 4 7 31
2-14-1
A =
Solution Writing the first row in place of the first column and the second
row in place of the second column, as is required by Definition 9-7, we find
that
"1
2"
4
-1
A' =
7
4
.3
-1.
The same argument shows that
"1 4 7 3"
2 -1 4 -1
(A')' =
It is obvious from the definition of the transpose operation that (A')' = A,
as was indeed illustrated in the last example. It is also obvious from
Definitions 9-3 and 9-5 that if A and B are conformable for addition, then
(A ± B)' = A ± B'. (9-16)
Now if A is of order (m x ri) and B is of order (« x r), and the general
matrix elements are ay and by, respectively, the element cy in the rth row and
y'th column of the matrix product C = AB is
cy = aabij + aabzj + • • • + «*»£>»;••
By definition, this is the element that will appear in the /'th row and fth
column of (AB)'.
Applying the transpose operation separately to A and B we find that A'
SEC 9-2
MATRIX ALGEBRA / 395
is of order (n X m) and B' is of order (r x «), so that only the matrix product
B'A' is conformable.
Now the elements of they'th row of B' are the elements of they'th column
of B, and the elements of the /th column of A' are the elements of the /th
row of A, so that the element dji in the/'th row and /th column of the product
D = B'A' must be
dji = bijOa + Z>2/#«2 + • ' ' + bnjdin
or, equivalently,
da = anbij + a^b^ + ■ • • + ai n b n j.
However, equating elements in the y'th row and /th column of (AB)' and
B'A' we find that cy = dji, and so
(AB)' = B'A'. (9-17)
We summarize these results into a final theorem.
theorem 9-3 (properties of transposition operation) If A and B are con-
formable for addition or multiplication, as required, then :
(a) (A')' = A (Transposition is Reflexive);
(b) (A + B)' = A' + B';
(c) (A - B)' = A' - B';
(d) (AB)' = B'A'.
Example 9-8 Verify that (AB)' = B'A', given that:
A =
"1 3"
Solution We have
AB =
so that
(AB)' =
However,
B'A' =
and
B =
'2 -1'
3 1
"1 3"
"2
-n
"11 2"
- 2 4.
.3
i_
.16 2_
11 16"
2 2
" 2 3"
"1 2"
"11
16"
-- 1 1.
.3 4_
. 2
2.
which is equal to (AB)'.
396 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
9-3 Determinants
The notion of a determinant, when first introduced in Chapter 4, was that of
a single number associated with a square array of numbers. In its subsequent
application in that chapter it was used in a subsidiary role to simplify the
manipulation of the vector product, and in that capacity it gave rise to a
vector. The determinant made yet another appearance in Chapter 5 when, in
connection with the change of variable in partial differentiation, it contained
functions as elements, and was called a Jacobian. In this role it is often called
a functional determinant, and it gives rise to a. function that is closely related
to the one-to-one nature of the change of variables involved.
These are but two of the situations in which determinants occur in
different branches of mathematics, and it is the object of this section to
examine some of the most important algebraic properties of determinants.
Our results will only be proved for determinants of order 3 but they are, in
fact, all true for determinants of any order.
We begin by rewriting Definition 4T6 using the matrix element notation
as follows :
definition 9-8 (third order determinant) Let A be the square matrix of
order (3 x 3)
an 0i2 ai3
A = 021 «22 023
_031 ^32 «33.
Then the expression
an «i2 «i3
A 1 = «21 022 #23
031 032 O33
is called the third order determinant associated with the square matrix A,
and it is defined to be the number
A I = on
021
023
+ O13
031
033
#22 023 O21 023 O21 fl22
— 012 + O13
032 033 «31 O33 «31 «32
where for any numbers a, b, c, and d,
a b
= ad — be.
The notation det A is also frequently used in place of | A | to signify the
determinant of A.
SEC 9-3
DETERMINANTS / 397
This definition has a number of consequences of considerable value in
simplifying the manipulation of determinants. Let us confine attention to the
third order determinant which is typical of all orders of determinant, and
expand the last line of Definition 9-8. We have
an
ai2
«13
021
022
023
031
032
«33
= «11«22033 — 011^23032 + #12023031
— 012021033 + 013021032 — 013022031, (9'18)
showing that one, and only one, element of each row and each column of the
determinant appears in each of the products on the right-hand side defining
| A |. Hence, if any row or column of a determinant is multiplied by a factor
A, then the value of the determinant is multiplied by A, since a factor A will
appear in each product on the right-hand side of Eqn (9-18). Conversely,
if any row or column of a determinant is divided by a factor A, then the value
of the determinant is divided by A. It is also obvious from Eqn (9-18) that
| A | = if all the elements of a row or column of | A | are zero, or if all the
corresponding elements of two rows or columns of | A | are equal.
Suppose, for example, that A = 3 and i 7 \ i
X.
Al =
1
2
3
2.
h
h
4,
1:
. 2 i!
4- *
Then it is easily shown that | A | = — 5, so that 3 | A | = —15. Now this
result could have been obtained equally well by using the above argument
and multiplying any row or any column of | A | by 3. If the first row of | A | is
multiplied by 3 we have ^ / N
31 Al =
3 6 9
2 1 1
4 1 2
"L
J
= -15
■ (\
M-
or, alternatively, if the third column is multiplied by 3 we have
3 I Al =
1 2 9
2 1 3
4 1 6
= -15.
It is readily verified from Eqn (91 8) that interchanging any two rows or
columns of | A | changes its sign. Thus we have
398 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
1 4
3
2 1
4
= _
9 4
-6
1
3 4
3
-2
4
2 4 1
9-6 4
in which the determinant on the left has been obtained from the one on the
right by interchanging the second and third columns.
A particularly simple case arises when | A | is the determinant associated
with a diagonal matrix A, for then all off-diagonal elements are automatically
zero. This implies that Eqn (9-18) reduces to | A | = fliio 2 2#33, which is just
the product of the elements of the principal diagonal. Thus if
A| =
then I A I = (3)(-2)(4) = -24.
Another useful result is that the value of a determinant is unchanged when
elements of a row (or column) have added to them some multiple of the
corresponding elements of some other row (or column). We prove this result
by direct expansion in the following typical case. Consider the determinant
I D I obtained from | A | by adding to the elements of column 3 of | A |, A
times the corresponding elements in column 2 of | A | to obtain :
#11 #12 #13 + Xa\z
D I = 021 #22 #23 + Afl22
031 #32 «33 + Aa32
Then at once Definition 9-8 asserts that
D I = an
#22 #23
032 033
+ Oil
022 A022
O32 Afl32
— #12
021 023
#31 #33
— O12
021 Xa
031 Xa
22
32
+ #13
021 #2
031 #3
2
2
+ Afll2
#21
#31
#22
#32
Now the second term on the right-hand side is zero, whilst the fourth and
last terms cancel leaving only three remaining terms. These are seen to com-
prise the definition of | A |, so that we have proved that | D | = | A | or, in
symbols, that
On #12 #13 + Afli2
#21 #22 #23 + A022
#31 #32 #33 + Afl32
#11
012
#13
#21
022
#23
#31
#32
#33
SEC 9-3
DETERMINANTS / 399
A similar result would have been obtained had different columns been used
or, indeed, had rows been used instead of columns.
An obvious implication of this result is that if a row (or column) of a
determinant is expressible as the sum of multiples of other rows (or columns)
of the determinant, then the value of the determinant must be zero. This is so
because by subtraction of this sum of multiples of other rows (or columns)
from the row (or column) in question, it is possible to produce a row (or
column) containing only zero elements.
Let us illustrate how a determinant may be simplified by means of this
result. Consider the determinant
7 18 8
A I = 1 5 7
3 9 4
Subtracting twice the third row from the first row we find
1
A| = 1 5 7
3 9 4
whence | A | = —43.
Let us summarize our findings in the form of a theorem.
theorem 9-4 (properties of determinants)
(a) A determinant in which all the elements of a row or column are zero,
itself has the value zero ;
(b) A determinant in which all corresponding elements in two rows (or
columns) are equal has the value zero ;
(c) If the elements of a row (or column) of a determinant are multiplied
by a factor X, then the value of the determinant is multiplied by X;
(d) The value of a determinant associated with a diagonal matrix is
equal to the product of the elements on the principal diagonal ;
(e) The value of a determinant is unaltered by adding to the elements of
any row (or column), a constant multiple of the corresponding elements
of any other row (or column) ;
(f) If a row (or column) of a determinant is expressible as the sum of
multiples of other rows (or columns) of the determinant, then its value is
zero.
Higher order determinants can be defined with exactly similar properties
to those enumerated in the theorem above. Thus the determinant | A | of
order n associated with the square matrix A of order (« x n) has «! terms in
400 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
its expansion, each of which contains one, and only one, element from each
row and column of A.
definition 9-9 (fourth order determinant) If A is the square matrix of
order (4 X 4)
A =
All 012 tfl3 «14
021 022 «23 «24
«31 032 «33 #34
JZil 042 «43 G44_
then the expression
on ai2 ai3 an
«21 «22 023 024
031 032 033 034
041 O42 043 O44
A| =
is called the fourth order determinant associated with the square matrix A,
and it is defined to be the number
A 1 = on
022
023
024
032
O33
«34
— 012
042
O43
O44
+ Oi3
021 023 O24
O31 O33 O34
041 O43 044
021 «22 024
031 032 O34 — fli4
«41 042 O44
021
O22
023
031
O32
O33
041
O42
O43
An inductive argument applied to Definitions 9-8 and 9-9 shows one way
in which higher order determinants may be defined, but clearly our notation
needs some simplification to avoid unwieldy expressions of the type given
above. This is achieved by the introduction of the minor and the cof actor of
an element of a square matrix.
definition 9-10 (minors and cofactors) Let A be a square matrix of
order (« x ri) with general element ay, and let | A | be the determinant of
order n associated with A. Denote by A/y the determinant of order (n — 1)
associated with the matrix of order (n — 1 , n — 1) derived from A by the
deletion of row / and column j. Then My is called the minor of the element
at) of A, and Ay = (— l) <+ ^My is called the cofactor of the element ay of A.
Example 9-9 Find the minors and cofactors of the matrix
SEC 9-3
DETERMINANTS / 401
"1
3"
2
1
4
1
2
1.
A =
Solution The minor Mu is derived from A by deleting row 1 and column 1
and equating Mn to the determinant formed by the remaining elements.
That is,
Mn =
1 4
2 1
Similarly, minor Mi 2 is derived from A by deleting row 1 and column 2 and
equating M\i to the determinant formed by the remaining elements. That is,
M\i —
2 4
1 1
= -2.
Identical reasoning then shows that M\z = 3, M 2 i = — 6, M22 = —2,
M23 = 2, M31 = — 3, M32 = —2, and M33 = 1. As the cofactors At} =
(-l)<+'My, it follows that A n = -l,Ai2 = 2, An = 3, An = 6,^22 = -2,
A23 — —2, Asi = —3, A32 = 2, and ^33 = 1.
If A is a square matrix with general element ay and corresponding co-
factor Atj, it is easily seen that:
(a) if A is of order (2 x 2), then | A | = anAn + anA\%,
(b) if A is of order (3 x 3), then | A | = a\\A\\ + CI12A12 + 013^13,
(c) if A is of order (4 x 4), then | A | = auAu + CI12A12 + 013^13 +
auAu-
This suggests that if A is of order (« x «), then for | A | we could adopt the
definition
A I = auAu + CI12A12 +
+ a\ n A\ n .
(919)
This is a true statement and could be accepted as a definition, but it is
not the most general one which may be adopted. To see this we return to
Eqn (9T8) and re-arrange the terms on the right-hand side to give
I A I = a3l(ai2«23 — Al3«22) — «32(aiia23 — tfl3«2l)
+ a%%(a\\a22 — 012(121).
Hence, working backwards, we have
A I = 031
fll2
Ol3
— #32
fl22
#23
flu
#13
+ #33
#21
#23
#11 #12
#21 #22
402 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
thereby showing that it is also true that
I A I = «31^31 + CI32A32 + 033^33.
(9-20)
We now have two equivalent but different looking expressions for | A |
either of which could be taken as the definition of | A | . The expression in
(b) above involves the elements and cofactors of the first row of A and the
expression in Eqn (9-20) involves the elements and cofactors of the third row
of A. A repetition of this argument involving other rearrangements of the
terms of Eqn (9-18) shows that | A | may be evaluated as the sum of the
products of the elements and their cofactors of any row or column of A. This
very valuable and general result is known as the Laplace expansion theorem,
and it is true for determinants of any order though we have only proved it
for a third order determinant. Let us state this result formally as it would
apply to a determinant of order n.
theorem 9-5 (Laplace expansion theorem) The determinant | A ] associated
with any (n X n) square matrix A is obtained by summing the products of
the elements and their cofactors in any row or column of A. If A has the
general element ay and the corresponding cofactor is Ay, then this result is
equivalent to :
Expansion by elements of a row
n
I A I = J dijAlj
3 = 1
for i=l,2,. . ., n;
Expansion by elements of a column
n
I A I = 2 oyAij
i=l
fory'= 1,2,. . ., n.
Example 9- 10 Evaluate the determinant
Al =
1 4 2
3 -2 1
1 5 2
by expanding it (a) in terms of the elements of row 2, and (b) in terms of the
elements of column 3.
SEC 9-3
DETERMINANTS / 403
Solution
(a) | A | =
(b) | A | =
3
4
2
-2
1 2
- 1
1
4
5
2
1 2
1
5
3
-2
1 4
1
4
2
- 1
+ 2
1
5
1 5
3
-2
= 5
= 5.
An important extension of Theorem 9-5 asserts that the sum of the
products of the elements of any row (or column) of a square matrix A with
the cofactors corresponding to the elements of a different row (or column) is
zero. This is easily proved as follows.
Let A be a matrix of order (n x «), and let B be obtained from A by re-
placing row q of A by row/?. Then B has the elements of rows/? and q equal,
so that by Theorem 9-4 (b) it follows that | B | = 0. Expanding | B | in
terms of elements of row q by Theorem 9-5 we then find
B | = ClpiAqi + dpzAqt +
T Qpn-Aqn
= 0,
which was to be proved. A similar argument establishes the corresponding
result for columns and so we have proved our assertion.
theorem 9-6 The sum of the products of the elements of any row (or
column) of a square matrix A with the cofactors corresponding to the ele-
ments of a different row (or column) is zero. Symbolically, if a, ; - is the general
element of A and Ay is its cofactor, then :
Expansion by elements of a row
n
2 a P iAqi =
i = l
if p ^ q; and
Expansion by elements of a column
n
2 OipAiq =
i = l
if p ^- q.
Example 9-11 Verify that the sum of the products of the elements of
column 1 and the corresponding cofactors of column 2 of the following
matrix is zero :
404 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
"1
3
2"
4
1
2
_3
1
3_
A =
Solution The elements of column 1 are an = 1, 021 = 4, 031 = 3. The
cofactors corresponding to the elements of the second column are A 12 = —6,
A22 = — 3, A32 = 6. Hence
aiiAu + (I21A22 + C131A32 = (l)(-6) + (4)(-3) + (3)(6) = 0.
9 -4 Linear dependence and linear independence
We are now in a position to discuss the important idea of linear independence.
This concept has already been used implicitly in Chapter 4 when the three
mutually orthogonal unit vectors i, j, and k were introduced comprising what
in linear algebra is called a basis for the vector space. By this we mean that
all other vectors are expressible in terms of the vectors comprising the basis
through the operations of scaling and vector addition, but that no member
of the basis itself is expressible in terms of the other members of the basis.
Thus no choice of the scalars X, fi can ever make the vectors i and X\ + juk
equal. It is in this sense that the unit vectors i, j, k comprising the basis for
ordinary vector analysis are linearly independent, and obviously any other
set of unit vectors a, b, c which are not co-planar, and no two of which are
parallel, would serve equally well as a basis for this space.
The same idea carries across to matrices when the term vector is inter-
preted to mean either a matrix row vector or a matrix column vector. Thus
the three column vectors
Ci =
c 2 =
and C3 =
are not linearly independent because C3 = Ci + 2C2, whereas the three row
vectors
Ri = [1 0], R 2 = [0 1 0],
and
R3 = [0 1]
are obviously linearly independent, because no choice of the scalars A, fi can
ever make the vectors Ri and AR2 + ^3 equal. It is these ideas that underlie
the formulation of the following definition.
definition 9T1 (linear dependence and linear independence) The set of
n matrix row or column vectors Vi, V2, . . ., V» which are conformable for
addition will be said to be linearly dependent if there exist n scalars ai, «2,
. . ., a„, not all zero, such that
SEC 9-4
LINEAR DEPENDENCE AND LINEAR INDEPENDENCE / 405
ociVi + a 2 V 2 + • • • + a„V„ = 0.
When no such set of scalars exists, so that this relationship is only true
when ai = 0C2 = • • • = «.„ = 0, then the n matrix vectors Vi, V2, . . ., V«
will be said to be linearly independent.
In the event that the n matrix vectors in Definition 9-11 represent the
rows or columns of a rectangular matrix A, the linear dependence or inde-
pendence of the vectors Vi, V2, . . ., V» becomes a statement about the
linear dependence or independence of the rows or columns of A. In particular,
if A is a square matrix, and linear dependence exists between its rows (or
columns), then by definition it is possible to express at least one row (or
column) of A as the sum of multiples of the other rows (or columns). Thus
from Theorem 9-4 (f), we see that linear dependence amongst the rows
or columns of a square matrix A implies the condition | A | =0. Similarly,
if I A I t^ then the rows and columns of A cannot be linearly dependent.
theorem 9-7 (test for linear independence) The rows and columns of a
square matrix A are linearly independent if, and only if, | A | ^ 0. Conversely,
linear dependence is implied between rows or columns of a square matrix
A if I A I = 0.
Example 912 Test the following matrices for linear independence between
rows or columns:
A =
1
4 3"
2
18 7
4
-6 1
and
B
"1
1
0"
3
2
1
_1
1
3_
Solution We shall apply Theorem 9-7 by examining | A | and | B |. A simple
calculation shows that | A | = 0, so that linear dependence exists between
either the rows or the columns of A. In fact, denoting the columns of A by
Ci, C 2 , and C 3 , we have C 2 = 2(C 3 — Ci). As | B | = — 3 the rows and
columns of B are linearly independent.
Let us now give consideration to any linear independence that may exist
between the rows or columns of a rectangular matrix A of order (m X «). If
r rows (or columns) of A are linearly independent, where r < min (w, «),
then Theorem 9-7 implies that there is at least one determinant of order r
that may be formed by taking these r rows (or columns) which is non-zero,
but that all determinants of order greater than r must of necessity vanish.
This number r is called the rank of the matrix A, and it represents the greatest
number of linearly independent rows or columns existing in A. If, for example,
A is a square matrix of order (« x n) and | A | ^ 0, this implies that the
rank of A must be n.
406 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
definition 9-12 (rank of a matrix) The rank r of a matrix A is the greatest
number of linearly independent rows or columns that exist in the matrix A.
Numerically, r is equal to the order of the largest order non-vanishing deter-
minant [ B [ associated with any square matrix B which can be constructed
from A by combination of r rows and r columns.
Example 9-13 Find the rank of the following matrix:
10 1 01
•111-11
A =
-301-10
Solution The largest order of determinant that can be constructed in this
case from the rows and columns of A is 3. As there is certainly one such
determinant that is non-vanishing, namely the one associated with the first
three columns of A, the rank of A must be 3. The fact that other non-vanishing
determinants of order three may be constructed from A is immaterial (e.g.,
take the last three columns).
9-5 I nverse and adjoint matrix
The operation of division is not denned for matrices, but a multiplicative
inverse matrix denoted by A -1 can be denned for any square matrix A for
which | A | t^ 0. This multiplicative inverse A~* is unique and has the pro-
perty that
A-iA = AA" 1 = I
where I is the unit matrix, and it is defined in terms of what is called the
matrix adjoint to A. The uniqueness follows from the fact that if B and C are
each inverse to A, then B(AC) = (BA)C, so that BI = IC, or B = C.
definition 913 (adjoint matrix) Let A be a square matrix, then the
transpose of the matrix of cofactors of A is called the matrix adjoint to A,
and it is denoted by adj A. A square matrix and its adjoint are both of the
same order.
Example 9-14 Find the matrix adjoint to:
A =
"1
2
r
3
1
2
1
2_
Solution The cofactors Ay of A are: An = 2, A\% — —6, A\z = 1, A<i\
= —3, A 22 = 0, Avz = 3, Azx — — 1, Aii = 3, and A 33 = —5. Hence the
SEC 9-5
INVERSE AND ADJOINT MATRIX / 407
matrix of cofactors has the form
"2-6 1"
-3 3
-1 3 -5
so that its transpose, which by definition is adj A, is
" 2 -3 -1"
adj A = -6 3
_ 1 3 -5_
Now from Theorems 9-5 and 9-6, we see that the effect of forming either
the product (adj A)A or the product A(adj A) is to produce a diagonal
matrix in which each element of the leading diagonal is | A |. That is, we
have shown that
(adj A)A = A(adj A) =
A
A
(9-21)
(9-22)
whence
(adj A)A = A(adj A) = | A 1 1.
Thus, provided | A | ^ 0, by writing
| A|
we arrive at the result
A _1 A = AA" 1 = I. (9-23)
The matrix A -1 is called the matrix inverse to A and it is only defined fdr
square matrices A for which | A | ^ 0. A square matrix whose associated
determinant is non-vanishing is called a non-singular matrix. Although the
inverse matrix is only defined for non-singular square matrices, the adjoint
matrix is defined for any square matrix, irrespective of whether or not it is
non-singular.
definition 914 (inverse matrix) If A is a square matrix for which
| A | ,£ 0, the matrix inverse to A which is denoted by A -1 is defined by the
relationship
408 / LINEAR TRANSFORMATIONS AND MATRICES CH 9
A-l = ^-
Example 9-15 Find the matrix inverse to the matrix A of Example 9-14
above.
Solution It is easily found from the cofactors already computed that
j A | = —9. This follows, for example, by expanding | A | in terms of ele-
ments of the first row to obtain | A [ — (1)(2) + (2)(— 6) + (1)(1) = —9.
Hence from Definition 9-14, we have
A-i = ^ = (-l/9)
2 -3 -1
-6 3
1 3 -5
-2/9 1/3 1/9
2/3 -1/3
-1/9 -1/3 5/9.
The steps in the determination of an inverse matrix are perhaps best
remembered in the form of a rule.
Rule 2 (Determination of inverse matrix)
To determine the matrix A" 1 which is inverse to the square matrix A proceed
as follows :
(a) Construct the matrix of cofactors of A;
(b) Transpose the matrix of cofactors of A to obtain adj A;
(c) Calculate [ A | and, if it is not zero, divide adj A by I A I to obtain
A- 1 ;
(d) If | A | = 0, then A -1 is not defined.
It is a trivial consequence of Definition 9-14 and the fact that for any
square matrix A, | A | = | A' | (see Problem 9-34), that
(A-i)' = (A')" 1 . (9-24)
Also, if A and B are non-singular matrices of the same order, then
(B-iA-^AB = J = AB(B- 1 A" 1 ),
showing that
(AB)- 1 = B^A 1 . (9-25)
Accepting the result of Problem 9-35 as being valid for square matrices
A, B of arbitrary order (« X «), so that | AB | = | A 1 1 B |, we are able to
prove another useful result concerning the inverse matrix. If | A | ■# 0, then
AA _1 = I showing that | AA _1 | = 1, or | A [| A -1 | = 1. It follows from
this that:
SEC 9-5
INVERSE AND ADJOINT MATRIX / 409
A I = 1/1 A-M.
(9-26)
One final result follows directly from the obvious fact that (A -1 ) - ^ -1
= I, which is always true provided | A -1 | # 0. If we post-multiply this result
by A we find
(A- 1 )-^-^ = IA
giving
(A-i)-iI = A,
whence
(A- 1 )- 1 = A. (9-27)
theorem 9-8 (properties of inverse matrix) If A and B are nort-singular
square matrices of the same order, then :
(a) AA- 1 = A"!A = I;
(b) (AB)' 1 == B-U- 1 ;
(c) (A-i)' = (A')" 1 ;
(d) (A-i)-i = A;
(e) | A | = 1/1 A-i I.
Example 9-16 Verify that (A- 1 )' = (A')- 1 , given that
"1 3"
2 4
Solution '
We have
A- 1 =
~-2 3/2"
1 -I/ 2 .
'
so that
(A' 1 )' =
"-2 1 "
.3/2 -1/2.
However,
A' =
"1
3
2"
4
»
410 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
SO that
~-2 1
(A')" 1 =
.3/2 -l/2_
confirming that (A" 1 )' = (A')" 1 ,
9-6 Matrix functions of a single variable
All the matrix results that have been obtained so far are equally valid whether
applied to matrices whose elements are numerical Constants, or to matrices
whose elements are functions of a single variable t. When the latter is the
case it is convenient to copy the notation for a function used hitherto, and
to represent the matrix by writing A(t). In many respects it is convenient to
regard all matrices in this manner, since matrices with constant number
elements correspond to the subset of all possible matrices A(f) in which all
elements are constant functions.
When the elements of A(t) are all differentiable with respect to t in some
interval, it is reasonable to define a derivative of A(f) with respect to t, and
for this purpose we shall work with the following definition.
definition 9-15 (derivative of a matrix) Let A(/) be a matrix of order
(m X n) whose elements ay(0 are all differentiable functions of t in some
common interval to < t < t\. Then the derivative of A(i) with respect to t in
t o < t < ti, written dAjdt, is defined to be the matrix of order (m X n) with
elements day/d/. The matrix A(0 will be said to be differentiable in to < t
< t\. Symbolically this result becomes:
d^
dr
a\\{i) ai 2 (0
tf2l(0 022(0
a m i(t) a m z(i)
ai„(0
«2n(0
amn(t)
dan
At
dai2
dt '
dai n
' ' ~dT
6021
dt
dfl22
~dT '
da 2n
' dt
da m x
dt
da m i
dt '
dOmn
' ' dt
for t < t < h.
Example 917 Find dA/df given that:
"cosh t sin t cosh 2t~
A(t) =
sinh t cos t sinh 2t_
SEC 9-6
MATRIX FUNCTIONS OF A SINGLE VARIABLE / 411
Solution From Definition 9-15 we have at once:
"sinh t cost2 sinh It
cosh t —sin t 2 cosh 2;
dA
dt
for all t.
If an(t) and bi]{t) are differentiable functions in some common interval
t < t < ti, then we know from the work of Chapter 5 that
d day dbu
6t iatj±bii) = -dF ± ^'
and so
t- (fliifci^ + a<2^2; + • • • + a in bnj) =
d?
dan da« 2 , , , do<„
-(
Consequently, it then follows directly from Definitions 9-3 to 9-6 that for
suitably conformable matrices A and B:
cl „ x dA dB.
T (A ± B) = — ± — ;
dt dt dt
— (AA) = A — -, for any constant scalar A;
dt dt J
and
d , 4 ™ dA„ dB
- (AB) = — B + A — •
dt dt dt
(9-28)
(9-29)
(9-30)
Notice that in general dA 2 /d? =£ 2A(dA/dr), for setting B = A in Eqn (9-30)
yields
dA 2 dA A dA
dt dt dt
(931)
It also follows that if K is a constant matrix in the sense that its elements are
constant functions of t, then
dK
d7
= 0.
(9-32)
412 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
Using the results of Theorems 9-3 (d) and 9-8 (b) together with Eqn (9-30),
we can derive two useful results. The first result applies to any two matrices
A, B which are conformable for multiplication and is
d d dB' dA'
- ( AB)'-- ( <BX,= -A' + B'-;
(9-33)
the second result applies to any two non-singular square matrices which are
conformable for multiplication and is
a a dB^ 1 dA -1
f- (AB)- 1 = f- (B-iA-i) = — - A-i + B-i -—• (9-34)
dt dt dt dt
We now. summarize these results in the form of a general theorem.
theorem 9-9 (properties of matrix differentiation) Let A(?) and B(t) be
suitably conformable matrices which are differentiable in some common
interval to < t < ti, and let K be a constant matrix and X a scalar. Then
throughout the interval to < t < h:
d dA dB
(a) dl (A + B) =d7 + d7 ;
^ d , k m dA dB
(b) dl (A - B)= dF-d7 ;
(c) — (XX) = X — - ; (X a constant scalar)
dt dt
d dA dB
(d) _ (4B) = _B + A-;
dK
(e) — — = 0; (K a constant matrix)
d dB' dA'
(f) dl (AB) ' = ^ A ' + B '^ :
d dB^ 1 dA" 1
®dl (AB) " 1= ^- A " 1 + B "^T'
where A and B are non-singular matrices.
Example 9-18 Verify Eqn (9-33) for the matrices
A(0 =
t 11
"2 ?2
and
B(f) =
-1 t 2
; a l
SEC 9-7
SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 413
Solution We have
It + fi 1 + > 3 "
/5_2
AB =
so that
(AB)' =
and thus
P/ + /3 /5_ 2 -
1 +? 3
(AB)' =
At
Now,
A'(0 =
'2 + 3/ 2 5/ 4 "
3/ 2
1 -1
J > 2
so that
'1
_0 It
Using these results we have
TO 3? 2 '
and B'(/) =
"2 f 3 "
?2 1
dA
df
and
dJT
At
3/ 2 "
2?
dB' k , „, dA'
— A' + B' —
At At
2f
't -1"
1 r 2
+
'2 ? 3 "
/ 2 1
1 0"
2t
'3/ 2 3f«"
2/ 2 -2/ .
'2 + 3/ 2 5/ 4 "
3f 2
+
2 2?4"
? 2 2/
-i(AB)'.
9 - 7 Solution of systems of linear equations
A system of m linear inhomogeneous equations in the n variables .vi, x%,
. . ., x n has the general form
flll-Vi + «i 2 .V2 + • • • + ainXn = k]_
A21-V1 + A22-V2 + • • • + at n Xn = k 2 (9-35)
OmlXi + a m 2X2 + ■ • • + QmnXn — k m ,
414 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
where the term inhomogeneous refers to the fact that not all of the numbers
k\, k->, . . ., k m are zero. Defining the matrices
"A-i
A 2
"«n
a\i . . .
ay, '
■\"i
A =
021
an . . .
0211
X =
V2
a m \
a,n2 ■ ■ ■
amn_
^n
is system can
be written
AX =
K.
and
K
(9-36)
Here A is called the coefficient matrix, X the solution vector, and K the
inhomogeneous vector.
In the event that m = n and | A | ^ it follows that A -1 exists, so that
pre-multiplication of Eqn (9-36) by A -1 gives for the solution vector,
A iK.
(9-37)
This method of solution is of more theoretical than practical interest because
the task of computing A -1 becomes prohibitive when n is much greater than
three. However, one useful method of solution for small systems of such
equations (« < 4) known as Cramer's rule may be deduced from Eqn (9-37).
Consideration of Eqn (9-37) and Definitions 9- 14 shows that x{, the ;'th
element in the solution vector X, is given by
(kiAu + A-2^2i + • ' • + k n A ni )
(9-38)
for / = 1,2,. . ., «, where An is the cofactor of A corresponding to element
fly. Using Laplace's expansion theorem we then see that the numerator of
Eqn (9-38) is simply the expansion of | A< |, where At denotes the matrix
derived from A by replacing the /th column of A by the column vector K.
Thus we have derived the simple result
for i'=l,2,.. ., n,
(9-39)
which expresses the elements of the solution vector X of Eqn (9-35) in terms
of determinants.
Rule 3 (Cramer's rule)
To solve n linear inhomogeneous equations in n variables proceed as follows:
(a) Compute | A | the determinant of the coefficient matrix and, if
SEC 9-7
SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 415
| A | ^ 0, proceed to the next step;
(b) Compute the modified coefficient determinants | A,- |, / = l, 2, . . .,
n where A* is derived from A by replacing the ;'th column of A by the
inhomogeneous vector K;
(c) Then the solutions .vi, .\2, . . ., x n are given by
for ;' = 1,2,. , ., n;
(d) If | A | = the method fails.
Example 919 Use Cramer's rule to solve the equations:
Xl + 3.V2 + -Y3 =?= 8
2-Vi + -V2 + 3a- 3 = 7
•Yl + .Y2 — .Y3 =■ 2.
Solution The coefficient matrix A and the modified coefficient matrices
Ai, A2, and A3 are obviously:
A~
A 3
1"
3
-1.
8'
7
2
A 2
1"
3
-1
and
Hence
-Yl
A I - 12,
I Ad .
-V2 =
12, I A 2 I = 24, and | A3
I A 2 1 I A 3 1
— -r = 2, x 3 — 7-—
12, so that
In the more general case in which m = n, but | A | = 0, the inverse
matrix does not exist and so any method using A" 1 must fail. In these cir-
cumstances we must consider more carefully what is meant by a solution. In
general, when a solution vector X exists whose elements simultaneously
satisfy all the equations in the system, the equations will be said to be con-
sistent. If no solution vector exists having this property then the equations
will be said to be inconsistent. Consider the following equations :
xi + x a + 2x3 = 9
4yi — 2x 2 + x 3 = 4
5^1 — X2 + 3xs =» 1.
416 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
These equations are obviously inconsistent, because the left-hand side of the
third equation is just the sum of the left-hand sides of the first two equations,
whereas the right-hand sides are not so related (that is, 1 ^ 9 + 4). In effect,
what we are saying is that there is a linear dependence between the rows of
the left-hand side of the equations which is not shared by the inhomogeneous
terms. The row linear dependence in the coefficient matrix A is obviously
dependent upon the rank of A and we now offer a brief discussion of one way
in which the general problem of consistency may be approached.
Obviously, when working conventionally with the individual equations
comprising (9-35) we know that: (a) equations may be scaled, (b) equations
may be interchanged, and (c) multiples of one equation may be added to
another. This implies that if we consider the coefficient matrix A of the system
and supplement it on the right by the elements of the inhomogeneous vector
K to form what is called the augmented matrix, then these same operations
are valid for the rows of the augmented matrix. Clearly, the rank will not be
affected by these operations. If the ranks of A and of the augmented matrix
denoted by (A, K) are the same, then the equations must be consistent;
otherwise they must be inconsistent.
definition 9-16 (augmented matrix and elementary row operations)
Suppose that AX = K, where
A =
an
ai2 .
■ am
Xl
021
022 •
■ Ozn
, x =
X2
a n \
a n i ■
Qnn.
Xn
and K =
'k{
kz
Then the augmented matrix, written (A, K), is defined to be the matrix
"on Ol2 • •
«21 022 ■ ■
(A,K)
a n \ a n 2
a in
ki~
«2»
k2
Onn
. kn_
An elementary row operation performed on an augmented matrix is any
one of the following:
(a) scaling of all elements in a row by a factor 1\
(b) interchange of any two rows ;
(c) addition of a multiple of one row to another row.
An augmented matrix will be said to have been reduced to echelon form by
elementary row operations when the first non-zero element in any row is a
unity, and it lies to the right of the unity in the row above.
SEC 9-7
SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 417
Example 9-20 Perform elementary row operations on the augmented
matrix corresponding to the inconsistent equations above to reduce them to
echelon form. Find the ranks of A and (A, K).
Solution The augmented matrix
"1 12 9"
(A, K) = 4 -2 1 4
5 -1 3 1_
Subtract from the elements of row 3 the sum of the corresponding elements
in rows 1 and 2 to obtain
1
1 2
9
4
-2 1
4
-12
Subtract from the elements of row 2, four times the corresponding elements
in row 1 to obtain
'1
1
2
9
-6
-7
-32
-12
Divide row 2 by —6 and row 3 by —12 to obtain
"112 9 '
1 7/6 16/3
1
This is now in echelon form and the rank of the matrix comprising the first
three columns is 2, which must be the same as the rank of the coefficient
matrix A. The rank of (A, K) must be the same as the rank of the echelon
equivalent of the augmented matrix which is clearly 3.
The general conclusion that may be reached from the echelon form of an
augmented matrix (A, K), is that equations are consistent only when the ranks
of A and (A, K) are the same. If the equations are consistent, and A is of
order (n x n) and the rank r < n, we shall have fewer equations than vari-
ables. In these circumstances we may solve for any r of the variables xi in
terms of the n — r remaining ones which can then be assigned arbitrary
values.
theorem 910 (solution of inhomogeneous systems) The inhomogeneous
418 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
system of equations
AX = K,
where A is of order (n x n) and X, K are of order (« x 1) has a unique solu-
tion if | A | 9^ 0. If | A | =0, then the equations are only consistent when the
ranks of A and (A, K) are equal. In this case, if the rank r < n, it is possible to
solve for r variables in terms of the n — r remaining variables which may then
be assigned arbitrary values.
Example 9-21 Solve the following equations by reducing the augmented
matrix to echelon form :
Xl + 3X2 — X3 = 6
8xi + 9x 2 + 4x 3 = 21
2xi + x 2 + 2x3 = 3.
Solution The augmented matrix
"1 3 -1 61
(A,K)= 8 9 4 21
.2 1 2 3_
Subtract from the elements of row 2, the sum of three times the corres-
ponding element in row 3 and twice the corresponding element in row 1 to
obtain
"1 3 -1 61
2 1 2 3_
Interchange rows two and three to obtain
"1 3 -1 6"
2 1
2 3
Subtract twice row 1 from row 2 and divide the resulting row 2 by —5 to
obtain
~1 3 -1 6
1 -4/5 9/5
This is now in echelon form and clearly the ranks of A and (A, K) are both 2
SEC 9-7 SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 419
showing that the equations are consistent. However, only two equations
exist between the three variables xi, X2, and X3, for the echelon form of the
augmented matrix may be seen to be equivalent to the two scalar equations
4 9
x\ + 3x2 — X3 = 6 and xz — - X3 = -•
Hence, assigning xz arbitrarily, we find that
3 7 J 9 4
xi = - — - X3 and x 2 = - + - xz.
When the inhomogeneous vector K = 0, the resulting system of equations
AX = is said to be homogeneous. Consider the case of a homogeneous
system of n equations involving the n variables xi, x 2 , . . ., x n . Then it is
obvious that a trivial solution xi — x 2 = • • • = x n = corresponding to
X = always exists, but a non-trivial solution, in the sense that not all
xu x 2 , . . ., x n are zero, can only occur if | A | = 0. To see this notice that if
I A I ^ then A" 1 exists, so that premultiplication of AX = by A -1 gives
at once the trivial solution X = as being the only possible solution.
Conversely, if | A [ = 0, then certainly at least one row of A is linearly
dependent upon the other rows, showing that not all of the variables xi,
x 2 , . . ., x n can be zero.
When a non-trivial solution exists to a homogeneous system of n equa-
tions involving n variables it cannot be unique, for if X is a solution vector,
then so also is AX, where A is a scalar. As in our previous discussion, if the
rank of A which is of order (« x «) is r, then we may solve for r of the vari-
ables xi, x 2 , . . ., x„ in terms of the n — r remaining ones which can then be
assigned arbitrary values.
theorem 9-11 (solution of homogeneous systems) The homogeneous
system of equations
AX = 0,
where A is of order (n X n) and X, are of order (n x 1) always has the
trivial solution X = 0. It has a non-trivial solution only when | A | = 0. If
A is of rank r < n, it is possible to solve for r variables in terms of the n — r
remaining variables which may then be assigned arbitrary values. If X is a
non-trivial solution, so also is AX, where A is an arbitrary scalar.
Example 9-22 Solve the equations
X\ — X2 + Xz =
2xi + X2 — xz =
xi + 5X2 — 5X3 = 0.
420 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
Solution There is the trivial solution xi = X2 = xz — and, since the
determinant associated with the coefficient matrix vanishes, there are also
non-trivial solutions. The augmented matrix is now
(A, 0) =
1 -1 1 0"
2 1-10
1 5-5
which is easily reduced by elementary row transformations to the echelon
form
1 -1 1 0"
1-10
.0
This shows that there are only two equations between the three variables
jci, X2, and xs, for the echelon form of the augmented matrix is seen to be
equivalent to the two scalar equations
X\ — X2 + X3 =
and
X2 — X3 = 0.
Hence, assigning xz arbitrarily, we have for our solution xi = and x% =
X3 = k (say).
A practical numerical method of solution called Gaussian elimination is
usually used when dealing with inhomogeneous systems of n equations
involving n variables. This is essentially the same method as the one described
above for the reduction of an augmented matrix to echelon form. The only
difference is that it is not necessary to make the first non-zero element
appearing in any row in the position corresponding to the leading diagonal
equal to unity. We illustrate the method by example.
Example 9-23 Solve the following equations by Gaussian elimination:
Xl — X2 — X3 =
3xi + X2 + 2x3 = 6
2xi + 2x2 + X3 = 2.
Solution The augmented matrix
"1 -1 -1 01
(A,K)= 3 1 2 6
2 2 1 2_
Subtracting three times row 1 from row 2 and twice row 1 from row 3 gives
SEC 9-8
EIGENVALUES AND EIGENVECTORS / 421
- 1 - 1
4 5 6
4 3 2_
Subtraction of row 2 from row 3 gives
1-1-1 01
4 5
-2
The solution is now found by the process of 'back-substitution' using the
scalar equations corresponding to this modified augmented matrix. That is,
the equations
•\"1 — -V2 — A'3 =
4.V2 + 5.v 3 = 6
- 2.v 3 = -4.
The last equation gives A'3 = 2 and, using this result in the second then gives
-Y2 = — 1. Combination of these results in the first equation then gives
-V! = 1.
It is not proposed to offer more than a few general remarks about the
solutions of m equations involving n variables. If the equations are con-
sistent, but there are more equations than variables so that m > n, it is clear
that there must be linear dependence between the equations. In the case that
the rank of the coefficient matrix is equal to n there will obviously be a
unique solution for, despite appearances, there will be only n linearly inde-
pendent equations involving n variables. If, however, the rank is less than n
we are in the situation of solving for r variables xu x%, . . ., in terms of the
remaining n — r variables whose values may be assigned arbitrarily. In the
remaining case where there are fewer equations than variables we have
m < n. When this system is consistent it follows that at least n — m variables
must be assigned arbitrary values.
9 -8 Eigenvalues and eigenvectors
Let us examine the consequence of requiring that in the system
AX = K, (9-40)
where A is of order (n x n) and X, K are of order (n x 1 ), the vector K is
proportional to the vector X itself. That is, we are requiring that K = AX,
where X is some scalar multiplier as yet unknown. This requires us to solve
the system
AX = AX, (9-41)
422 / LINEAR tRANSFORMATIONS AND MATRICES
CH 9
which is equivalent to the homogeneous system
(A - AI)X = 0, (9-42)
whete I is the unit matrix.
Now we know from Theorem 91 1 that Eqn (9-42) can only have a non-
trivial solution when the determinant associated with the coefficient matrix
vanishes, so that we must have
A - XI I = 0.
(9-43)
When expanded, this determinant gives rise to an algebraic equation of
degree n in X of the form
X n + ociA"- 1 + <x 2 X n ~ 2 +
+ *n = 0.
(9-44)
The determinant (9-43) is called the characteristic determinant associated with
A and Eqn (9-44) is called the characteristic equation. It has n roots X\, Xo,
. . ., X„, each of which is called either an eigenvalue, a characteristic root, or,
in some texts, a latent root of A.
Example 9-24 Find the characteristic equation and the eigenvalues
corresponding to
A =
Solution We have
A- XI
so that
I A - XI I =
"1 2"
"1 0"
- X
=
.3 0.
-0 1.
1 - X 2
3 -X
1 - X
3
= ^ _ X - 6.
Thus the characteristic equation is
A2 - X - 6 = 0,
and its roots, the eigenvalues of A, are X = 3 and X = —2.
No consideration will be given here to the interpretation that is to be
placed on the appearance of repeated roots of the characteristic equation,
and henceforth we shall always assume that all the eigenvalues (roots) are
distinct.
Returning to Eqn (9-42) and setting X = Xi, where Xt is any one of the
eigenvalues, we can then find a corresponding solution vector X< which,
because of Theorem 9T1, will only be determined to within an arbitrary
SEC 9-8
EIGENVALUES AND EIGENVECTORS / 423
scalar multiplier. This vector X« is called either an eigenvector, a characteristic
vector or, a latent vector of A corresponding to fa. The eigenvectors of a
square matrix A are of fundamental importance in both the theory of matrices
and in their application, and some indication of this will be given later in,
Section 15-8.
Example 9-25 Find the eigenvectors of the matrix A in Example 9-24.
Solution Use the fact that the eigenvalues have been determined as being
2 = 3 and X = — 2 and make the identifications fa = 3 and fa = —2. Now
let the eigenvectors Xi and X2, corresponding to fa and fa, be denoted by
Xi =
*i"
xi
(i)
and
X 2 =
*i l
*2 1,
Then for the case k = fa, Eqn (9-42) becomes
= 0,
"(1-3) 2 1 Vx\™'
3 (0 - 3)J |_x 2 (1) .
whence
-2xi (1 » + 2x2 (1) = and
3xi (1 > - 3x 2 m = 0.
These are automatically consistent by virtue of their manner of definition,
so that we find that xi ll) = ,V2 (2) . So, arbitrarily assigning to xi a) the value
Xl
(i)
1 , we find that the eigenvector Xi corresponding to fa = 3 is
T
Xi =
A similar argument for A = fa gives
X\
x^
(2)"
= 0,
"(1 +2) 2 '
_ 3 (0 + 2).
whence
3xi (2 > + 2.x 2 (2) = 0.
Again, arbitrarily assigning to xi l2) the value xi {2) = 1, we find that X2 i2) =
— 3/2. Thus the eigenvector X2 corresponding to fa = —2 is
1"
3
~2J
Obviously ^Xi and /j,X% are also eigenvectors for any arbitrary scalar /u.
424 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
9-9 Linear transformations
Any introductory account of matrices would be incomplete were the basic
idea of a linear transformation not to be mentioned. Some discussion of this
important concept has already been offered in Section 9-1, and we now
develop the idea a little further. Indeed, to recapitulate briefly, it was ex-
plained there how a linear transformation is just a simple form of mapping of
the points of one plane into the points of another. This idea is still useful
when a matrix vector X of order (n X 1) is mapped by a matrix transforma-
tion into what is called its image X under the transformation. In this context
the elements of X are usually considered to be the components of a vector
in an n-dimensional space, so that X then specifies a point in that space,
and X is its image point under the linear transformation. We propose to
work with the following straightforward definition of such a transformation.
definition 9T7 (linear transformation) A general linear transformation
or point transformation of the vector X of order (n X 1) into the image X of
order (n X 1) is defined to be a transformation of the form
X = AX + K,
where the coefficient matrix A is of order (« X n) and the vector K is of order
(n X 1).
The special case considered in Section 9T involved a mapping of points
of the plane brought about solely by a rotation of the plane through an angle
6 about the origin. In that case the transformation corresponded to K = 0,
and
A =
"cos 6 —sin
sin 6 cos
(9-45)
This matrix is called an orthogonal matrix because it has the property that
A' = A -1 , and it is representative of a very important class of square matrices.
The first row of A is seen to contain the direction cosines of Ox' with respect
to Ox and Oy, whilst the second row contains the direction cosines of Oy'
with respect to Ox and Oy.
More generally, consider the rectangular axes 0{xi, X2, X3} which are
arbitrarily rotated about origin O to form the axes system 0{xi', X2', X3'}, in
which the direction cosine of Ox/ with respect to Oxj becomes j>y. Then the
matrix
Vll
V\2
V13
vzi
V22
V2Z
vzi
V32
V33_
(9-46)
SEC 99
LINEAR TRANSFORMATIONS / 425
is strictly analogous to matrix (9-45), and it is easily seen that X and X are
related by
X = AX.
(9-47)
In the special case that the rotation is only about the ^3-axis through an
angle 8 in the sense shown in Fig. 9-1, then ri3 = i>3i = V32 = V23 = and
V33 = 1, and
A =
cos u
sin 6
sin 6
0"
cos 6
1_
(9-48)
When discussing an application of a linear transformation to the theory of
elasticity in the next section we shall have occasion to refer to this matrix
again.
Aside from the rotation transformation characterized by Eqns (9-46)
and (9-47) there are three other simple transformations worthy of note and
these are listed below. It is left as an exercise for the reader to verify their
main properties when related to the plane which give rise to their names.
1. The identity transformation This is the transformation X = X, and it
corresponds to the case K = and A = I. Under this transformation X and
its image X are coincident.
2. The translation transformation This is the transformation X = X + K,
and it corresponds to an arbitrary non-zero vector K and A = I. The effect
of the transformation is to translate X to its image X, without rotation or
change of scale.
3. Dilatation transformation This is a transformation X = AX, in which A
is a non-singular diagonal matrix. Its effect when mapping X into X is to
change the scale of the different elements of X without translation or rotation.
In the special case that all the diagonal elements are equal say to X, where
A > 1, its effect is one of a magnification of X.
Example 9 26 If
x
X =
y.
x =
x
y'A
and
A =
'5 0"
2
deduce the image of the curve y = sinh x under the transformation
X = AX.
Solution We have
426 / LINEAR TRANSFORMATIONS AND MATRICES CH 9
X
"5 0"
x~
y.
.0 2
.y.
j
so that
x' = 5x and y' = 2y.
Thus the image curve of y = sinh x is given parametrically by
x = 5x and y' = 2 sinh x
or, equivalently, by
;
2 sinh (x'/5).
9-10 Applications of matrices and linear transforma-
tions
It is the object of this final section to indicate a few of the diverse applications
of the work of this chapter. Of necessity, we will be able to do no more than
outline this large and fruitful field of study, and for our first example we
look to the notion of rank to enable us to prove an important result in
dimensional analysis known as the Buckingham Pi theorem.
910(a) Application of rank to dimensional analysis — Buckingham Pi
Theorem
In many branches of engineering and science, a valuable method of approach
to difficult problems is via the method of dimensional analysis touched on
briefly at the start of Chapter 5. In essence, this method seeks first to char-
acterize a physical situation by forming dimensionless groups from the
variables involved, and then to determine the functional relationships which
relate these dimensionless groups. Our contribution will be to the first part
of this process, for we shall determine how many dimensionless groups exist.
Let us suppose that a physical situation is described by n variables
Ki, W2, . . ., u„, each of which corresponds to a physical quantity. Suppose
also that each of these quantities is capable of expression dimensionally in
terms of length [L], mass [M], and time [T], and that m has dimensions
[L} ai [M} bi [TJ\
Then the product of powers
mi* 1 uj* . . . u n \ (9-49)
where k\, k^, . . ., k n are real numbers, must have dimensions
Such products of powers will be dimensionless, in the sense that they are
SEC 9-10
APPLICATIONS OF MATRICES / 427
pure numbers having dimensions
[L]°[M]°[7T,
only if
aiki + a 2 k% + • ■ ■ + a n k„ =
biki + b 2 k 2 + • ■ • + b„k n =
c\k\ + c 2 k 2 + • • • + c n k n =
or, equivalently, if
fli a 2
bi b 2
ci c 2
~k{
an
k 2
b n
Cn_
_k n _
= 0.
(9-50)
Now if the rank of the coefficient matrix of order (3 x n) in Eqn (9-50)
is r, then we know from the work of Section 9-7 that it is possible to express
n — r of the variables k\, k 2 , . . ., k„ in terms of the remaining r variables.
That is to say, it will be possible to form n — r dimensionless quantities
7ti, 7T 2 , . . ., TTn-r from the n variables u\, u 2 , . . ., u n . The dimensionless
variables 77, are called Pi-variables. Hence we have proved the following
result.
theorem 9T2 (Buckingham Pi theorem) Let a physical situation be
capable of description in terms of n physical quantities wi, u 2 , . . ., u n , where
ut has dimensions [L] ai [M] 6i [77\ Then, if r is the rank of the matrix
«1 «2
b\ b 2
a c 2
a n
b n
the physical situation is capable of description in terms of n — r dimension-
less variables m, tt 2 , . . ., ir n - r formed from the variables m, u 2 , . . ., u„.
This is best illustrated by example. In the slow viscous flow of a fluid
between parallel planes, some functional relationship of the form
V=f{k,d, n )
exists between the average flow velocity V, the pressure gradient k along the
flow, the distance d between the planes and the viscosity tj. The dimensions
of these quantities which will form the matrix in the Buckingham Pi theorem
are shown in the table below:
428 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
V
k
d
n
L
1
-2
1
-l
M
1
l
T
-1
-2
-i
The rank of the (3 x 4) matrix whose elements comprise the entries in
this table is 3, as may be seen, for example, by using elementary row
operations to reduce it to its echelon equivalent
"1 -2 1 -r
1
1
in which the determinant formed from the first three columns is non-zero.
Thus, from the conditions of the theorem, the number of v variables is
4 — 3 = 1. A dimensionless grouping in this case is kd 2 jr]V, and any product
of powers of the form shown in (9-49) must be a power of this one dimension-
less group. Hence this physical problem is capable of description in terms of
the one dimensionless grouping -n = kd 2 lr)V. As the velocity profile across
the flow only depends on the distance x from one of the walls, our result
implies that all such flows will be characterized by one curve describing the
variation of -n with x/d.
9-10 (b) Differentials as linear transformations
We now consider a generalization of the total differential as described in
Theorem 519 and subsequently used in Theorem 5-22. Let us suppose that
«1 = /l(*l, X2, . . ., x„)
M2 = fc(xi, x 2 , . . ., x„)
(9-51)
Itn =fn(xi, X2, . . ., X n )
then it follows from Theorem 5T9 and the properties of matrices that
~ 8 A d A 8 A~
dx\ dx% ' ' dx n
dwi
dw 2
du n
a/2_
8x\
8x2
8fi_
8X U
3/»
8x\
3/»
8x2
8x„
dxi
d*2
dx„
(9-52)
SEC 9-10
APPLICATIONS OF MATRICES / 429
This can be written
du = A dx
(9-53)
by identifying du, dx with the (n x 1) column vectors in Eqn (9-52) in the
obvious manner, and A with the (n X n) matrix of partial derivatives.
Viewed in this light, Eqn (9-53) may be seen to be a local linear transformation
mapping dx into du. The adjective local is used here because the transforma-
tion will only be a linear transformation when A is a constant matrix, and as
the elements of A are functions of x\, X2, . ■ ., x n , they can only be approxi-
mated by constants in the neighbourhood of any fixed point P with co-
ordinates {xi F , X2 P , . . ., x n v ). For different points P, the transformation A
will be different, showing that Eqn (9-53) represents a more general type of
transformation than a general linear point transformation.
Transformation (9-53) will be one-to-one provided that A -1 exists, for
then a unique inverse mapping
dx = A- 1 dx
(9-54)
will exist. The condition for this is, of course, that | A | + at the point P.
This will be recognized as the non-vanishing Jacobian condition already
encountered in Chapter 5.
Fig. 9-2 Spherical polar coordinates.
By way of example, consider the relationship between the spherical polar
coordinates (r, <f>, d) and the Cartesian coordinates (x, y, z) illustrated in
Fig. 9-2 and described by
x = r sin 6 cos <f>
y = r sin 6 sin <f>
z = r cos 6.
Making the identifications u\ = x, ui = y, u% = z, and xi = r, x% = 6,
430 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
X3 = <f>, a simple calculation shows that Eqn (9-52) will take the form
dx'
dy
dz
sin 6 cos <f> r cos d cos <j> — r sin d sin <£"
sin sin <^ r cos sin ^ r sin 6 cos <£
cos 6 —r sin
"dr
d0
(9-55)
Denoting the square matrix in Eqn (9-55) by A, it is easily established that the
Jacobian determinant | A | = r 2 sin 0. Calculating the inverse matrix A -1
and using it to deduce the inverse mapping we have, provided r 2 sin =£ 0,
that
dr'
d0
J4>.
r<- sin i
r 2 sin 2 cos <f> r 2 sin 2 sin <f> r 2 sin cos 0"
r sin d cos 9 cos <£ r sin 6 cos sin <£ — r sin 2
— rsin<£ rcos<f>
dx
d/
.dz.
(9-56)
910 (c) Linear transformation of the stress tensor
In the mathematical theory of elasticity it is useful to introduce the concept
of the stress vector associated with any plane element of area within a solid
body. The magnitude of the stress vector is the force per unit area acting on
that plane element of area, and its sense is the sense of the force which is
exerted on that element located at point P, say, by the surrounding material.
In a solid, unlike a liquid, this force depends on the orientation of the element
of area, and it is convenient to describe the situation at point P by considering
elements of plane area normal to each of the unit vectors xi, X2, X3 of a
rectangular Cartesian system 0{x\, X2, X3}. If the components in the xi, X2,
and X3 directions of the stress acting on the element of area with x* as its
normal are t*i, t*2, and t* 3 , then the complete information concerning the
components of stress acting on all three mutually orthogonal elements of
area at P will be contained in the following table :
Stress Components at P
1
2
3
Surface Normal to xi
Til
T12
T13
Surface Normal to X2
T21
T22
T23
Surface Normal to x%
T31
T32
T33
In general there will be a different table of this type for each point P in the
solid.
SEC 9-10
APPLICATIONS OF MATRICES / 431
The matrix T defined by
Til T12 T13
T = T21 T22 T23
_T31 T32 T33.
is called the stress tensor at the point P, and it is fundamental to the develop-
ment of the mathematical theory of elasticity. Let us now indicate how the
stress tensor transforms when axes centred on P are rotated, since this is a
situation of considerable practical importance, being related to the deter-
mination of the directions of minimum and maximum stress at any point in
a solid body.
Fig. 9-3 Rotation 6 about *3-axis.
For this purpose we shall assume that no external moments act on the
body, for then it can be shown that T is symmetric. In addition, we will set
T13 = T23 = T33 = which characterizes what is called a. plane state of stress,
since all the forces then lie in the (xi, X2)-plane. The appropriate rotation
matrix A relating the system 0{xi, xi, x%} to 0{xi', x 2 ', *3'} when a rotation
6 about the X3-axis has been made is that given in Eqn (9-48) (see Fig. 9-3).
Hence, setting
F = AT,
(9-57)
then the elements of row i of F will contain the components of the trans-
formed force vector acting on the element of area with Ox/ as normal. To
relate this result to the stress components t</ relative to the new axes
0{xi', X2', X3'}, we must use the fact that rt/ is equal to the projection of the
force acting on the element of area normal to Ox/ along Ox/. To achieve
this result by matrices we must post multiply A by the transpose F' of F.
This is so because row 1 of A contains the direction cosines of axis Ox/ and
row j of F contains the components of the force acting on the element of
area with Ox/ as normal, and the rule for matrix multiplication is 'rows into
432 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
columns'. Thus if T is the transformed stress tensor, then
f = AF' = AT A',
but as T is symmetric, T' = T, giving
f = ATA'.
Using the fact that
T =
Til
T12
0"
T12
T22
.0
0.
(9-58)
(9-59)
when evaluating the indicated matrix products in Eqn (9-59) then shows that
the stress components of the transformed stress tensor are
Tn' =* tii cos 2 6 + T22 sin 2 — 2ti2 sin cos d
T22' = tii sin 2 6 + T22 cos 2 6 + 2ti 2 sin 6 cos (9-60)
T12' = (th — T22) sin cos 6 + ri2(cos 2 — sin 2 0)
with
T13' = T23' = T33' = 0.
These results form the basis of many important studies involving plane
stress in solids on which no external moment is acting.
PROBLEMS
Section 91
91 Suggest two physical situations in which the outcomes may be displayed in
the form of a matrix.
9-2 Find the sum A + B and difference A — B of the matrices
" 1 2
3
4 "
" 2 3
1
2 "
A =
2 12 2
.12 0.
, B =
2 2
. 1 -2 1 1 .
9-3 Evaluate the following inner products:
"1"
"2"
" 2"
(a) [2 113]
2
2
; (b) [1 -2 7 4]
3
; (c) [2 -1
3 1]
-1
3
-1-
-1-
- 1.
9-4 Evaluate the following matrix products:
3 12"
"1
2"
1
-1
12 2 2
; (b)
1 1 1 0_
-1
1-
(a)
9-5 State which of the following forms of matrix product are defined and, where
appropriate, give the shape of the resulting product matrix:
PROBLEMS
/ 433
-1
2 1
3"
~-i r
2 2
1
-1 1
-1
1 1
1
1.
- 3 2-
(a) (7 x 3)(3 x 9);
(c) (1 x 9)(9 x 1);
9-6 If the matrices I, A, and B are given by
(b) (5 x 3X2 x 3);
(d) (3 x 1X1 x 4).
1 =
"1 0"
"2 1 3"
1
, A =
1 2 1
.0 1_
.5 1 4.
and B =
1
2
1
-1
3
1
2
1
2
1
show that
(a) IA = AI = A;
(b) I B = B but that B I is not defined.
9-7 Give an example of matrices A and B for which:
(a) the product A B is defined but the product B A is not;
(b) the products A B and B A are both defined but are matrices of different
order ;
(c) the products A B and B A are both defined and are the same order as
A and B, but they are not equal.
9-8 Display each of the following sets of simultaneous equations in matrix form:
(a) 2x + 4/ + z = 9
x - 3y + 2z = -4
x + y — z = 1,
(b) w + 2x - y = 4
x - 3y + 2z = - 1
2w + 5x - 3z =
Aw — y + Az = 2,
(c) 3w + x-2y + 4z=l
w — 3x + y — 3z = 4
w + Ix + 2y + 5z = 2,
(d) 2x + y — z = Ax
3x + 2y + Az = Ay
x - 3y + 2z = Iz.
9-9 Let matrices R„ and R^, be defined as follows:
cos 8 -sin 81 r CO s 4. -sin f
■ „ „ and R * =
_sm cos 0J v [sin <f> cos <f>
R„ =
434 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
and let
X =
~x~
, X' =
V"
-y-
-/-
and X'
-El-
Then, if
X' = R S X and X" = R^X',
show by matrix multiplication and use of trigonometric identities that
R#[RflX] = Rg+^X.
Interpret this result geometrically.
Section 9-2
9-10 Construct the matrices of order (3 X 2) whose general element atj has the form :
(a) aij = / 2 + f - 2ij;
(b) at) «= sin iO cosjd.
9-11 State which of the following pairs of matrices can be made equal by assigning
suitable values to the constants a, b, and c. Where appropriate, determine what
these values must be.
fa)
(b)
"1 2 1
0"
"1 2
1 0"
3 a b
2
and
3 1
2 2
»
1 2 c
1_
.1 2
4 1.
1 5 a
2"
"1 5
1 2"
2 «2 3
b
and
2 4
3 4
>
.4 3 2
c_
.4 3
2 1.
1
(a + b) 3 "
"1 4 3"
(fl + c)
2 4
and
2 4
1
2 (b
+ c)_
.1 2 2.
(c)
9-12 Find the numbers a, b, c, and d in order that the following matrix equation
should be valid:
'6 4 6"
6 -1 -2
.3 6 6.
9-13 Use Definitions 9-3 and 9-4 to prove that if ?., n are scalars and matrices A
and B are conformable for addition, then
(a) A(A + B) = M. + AB,
(b) AA + M = (A + /*)A.
9-14 Determine 3 A + 2B and 2A — 6B given that
"2 -
2a 1
5"
"2 3 r
3 2
-b
+
3 </ 4
=
3c 4
1.
.3 2 5.
_ ri 37'
~ L2 -1 6
and B =
1 *]•
3 2j
PROBLEMS / 435
915 If
and
D
'2 1
3 2
1
B =
1 1 0"
1 1
3 1
"2 3 4~
, c =
1 5 6
find the matrix products A B and C D.
9-16 This example shows that the matrix product A B = does not necessarily
imply either that A = or that B = 0. If,
T 1 -1 11
-3 2 -1
-2
1
oJ
and B =
"1 2
2 4
.1 2
find A B and B A and show that AB^BA.
9-17 Show that the matrix equation
AX = K,
where
A =
"i 3 r
~Xl~
l i 2
, x =
X2
.2 2 0.
. x ^
and K =
may be solved for x u x 2 , and x 3 by pre-multiplication by B, where
-i i
B =
-i -
i -iJ
9 18 Use matrix multiplication to verify the results of Theorem 9-2 when A B
and C are of the form ' '
"1 3 2"
--1
2
r
~-2 2 1
1 4
, B =
3
-2
-l
, and C =
2 4
.2 3 1_
. 1
4
2_
.13 1
9-19 If A is a square matrix, then the associative property of matrices allows us to
write A" without ambiguity because, for example, A 3 = A(A A) = (A A)A. If
"cosh x sinh x
^sinh x cosh *
use the hyperbolic identities to express A* and A 3 in their simplest form and
use induction to deduce the form of A".
436 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
2 5 7 9"
; (0
4 3 1_
4
2
19'
2
4
9-20 Transpose the following matrices:
(a) [1 4 17 3]; (b)
(d)
9-21 Use Definition 9-7 and Theorem 9-3 to prove that:
(a) the sum of a square matrix and its transpose is a symmetric matrix;
(b) the difference of a square matrix and its transpose is a skew-symmetric
matrix.
Illustrate each of these results by an example.
9-22 Verify that (A B)' = B' A', given that
"-4 2"
3
3
2
-1
-2"
1
; (e)
"4"
3
1
0.
-0-
A =
1 4 7
9 -3 1
and B =
3 1
-5 6
9-23 If a matrix A contains complex numbers as elements it is said to be a complex
matrix. Its complex conjugate is denoted by A* and is defined to be the matrix
obtained from A by replacing each element by its complex conjugate. Show
from this and the definitions given in the text that :
(a) (A*)* = A;
(b) (A±B)* = A* ±B*;
(c) (/<A)* = //A*, where /i is any complex number and ,« is its complex
conjugate.
9-24 Find the complex conjugates of the matrices A and B, where
A =
1 2 + 'i , „ r ' J - 2i i
and B =
- 2/ i J 11 + i 1 + i J
and, taking /i = 1 — i, use them to verify the results of the previous problem.
Section 9-3
9-25 Evaluate the determinants v
(a)
1 3
1 2
; (b)
2 5
; (c)
4 7
1 3 7
-5 -5
/ 9m Without expanding the determinant, prove that
L
1 + a\ a\ ax
«2 1 + a2 a%
03 03 1 + «3
= (1 + a\ + 02 + 03).
PROBLEMS / 437
9-27 Use Theorem 9-4 to simplify the following determinants before expansion :
(a)
(0
A| =
A| =
42 61 50
3 2
4 6 5
2 1 5
5 17 56
4 1 7
(b) |A| =
9
16
2
9-28 Without expanding prove that
+ — _
x 2 + oi 2 0102 aids
i-
a?fX\ X 2 + a% 2 0203
asai dllgh x 2 + as 2
9-29 Show without expansion that
a 2 b 2 c 2
a b c
1 1 1
= x\x 2 + ai 2 + a 2 2 + A3 2 ).
= (a - b)(a - c)(b - c).
This determinant is called an alternant determinant. Illustrate the result by
means of a numerical example and verify it by direct expansion.
9-30 Prove that
sin (x + Jit) sin x cos x
| A | = sin {x + Jit) cos x sin x
1 a 1 - a
is independent of a, and express it as a function of x.
9-31 Find the minors My and cofactors Ay of each element a i} in the matrix
i -! y
9-32 If A is an arbitrary matrix of order (3 x 3) with general element a« and co-
factor At], show by direct expansion that:
(a) aiiAm + ai2A3?. + ai3A33 = 0;
(b) Ol3Al2 + £23/422 + 033/432 = 0.
9-33 Use the Laplace expansion theorem to expand determinant (b) in Problem
9-25 first in terms of elements of the third row, and then in terms of elements
of the third column.
9-34 If A is a matrix of order (3 x 3) and A' is its transpose, prove by direct
expansion that
I A I - I A' I.
438 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
Use the Laplace expansion theorem to prove that this result is true for a
square matrix A of any order.
9-35 Verify by direct expansion that for any square matrices A and B of order
(2 X 2):
|AB| = 1A[|B|.
This result is, in fact, valid for square matrices A and B of any order.
Section 9-4
9-36 Which of the following sets of vectors are linearly independent, and where
linear dependence exists determine its form:
(a) Ci
(b) Ri = [1 9 -2 14], R 2 =[-2 -18 4 -28];
"3 _
" 0"
" 0"
, C 2 =
-7
, C3 =
.0.
0.
.15.
"2"
"1"
"1"
"5"
1
, C2 =
1
, C3 =
2
, C4 =
6
.0.
.2.
.1.
.4.
(c) Ci =
9-37 Test the following matrices for linear independence between their rows or
columns:
(a)
1 2
-1 0"
"
2
3
r
"1 2 1 5 -
2 3
1 1
1 1
2
; (b)
-2
-3
1
-1
2
-2
; (c)
2 12
10 2 1
1
2 3-
— 1
-2
2
0-
-5 3 7 7.
9-38 Find the rank of the following matrix:
2 1 4 3'
A= -1 2 4-2 6
7 -4 -12 14 -12
9-39 Construct an example of a matrix of order (4 x 3) which is (a) of rank 2, and
(b) of rank 3.
Section 9-5
9-40 Show that adj A = A when
-_4 _3 _3"
A= 1 1
.443.
9-41 Find the matrix adjoint to each of the following matrices:
PROBLEMS / 439
(a)
9-42 Set
'1 2 3'
2 3 2
3 3 4
(b)
1 2 3'
1 3 4
1 4 3
(c)
a b
c d
caca-ca
and equate corresponding elements to determine the inverse of
ca-
9-43 Find the
inverse of
" 3 -2 -1"
A =
-4 1 -1
.201.
Verify that:
(a) A" 1 A = AA- 1 = I;
(b) (A-i)-i = A.
9-44 Given that A and B are
"1 2 r
"1
-1
A =
1 4 2
and B =
2
_0 3 2.
.1
verify that (A B)- 1 = B 1 A" 1 .
Section 9-6
9-45 Find dA/df and determine the largest interval about the origin in which it is
defined, given that
AW =
~2t 3 tanr cosfl
.3 4-f« 1 + /J'
•46 Given that
A(0 =
"cosh t sinh /")
_sinh t coshtj
It < 2 J
verify results (d), (f), and (g) of Theorem 9-9.
9-47 Show that for the matrix
"cosf — sinr"
_sin t cos /_
it is true that (d/d^A 2 = 2A(dA/dO, but that this is not true for the matrix
A(/) =
440 / LINEAR TRANSFORMATIONS AND MATRICES CH 9
A(0
L2 /tj"
9-48 Show that if A(f ) is a non-singular matrix, then
At At
Verify this result when
fcos t — sin r~l
A= .
|_sin / cos tj
Section 9-7
9-49 Solve the following equations using Cramer's rule:
Xl + X2+ X3= 7
2xi — X2 + 2x3 — 8
3xi + 2*2 — xa = 11.
9-50 Solve the equations of the previous example using the inverse matrix method
and compare the task with the previous method.
9-51 Solve the following equations using Cramer's rule:
Xl — X2 + X3 — Xi — 1
2xi — X2 + 3X3 + X4 = 2
Xl + JC2 + 2X3 + 2X4 = 3
Xl + X2 + X3 + -X4 = 3.
9-52 Write down the augmented matrix corresponding to the equations :
2xi — X2 + 3X3 = 1
3xi + 2x2 — X3 = 4
xi — 4x2 + 7x3 = 3.
Show, by reducing this matrix to its echelon equivalent, that these equations
are inconsistent.
9-53 Write down the augmented matrix corresponding to the equations:
3xi + 2x2 — X3 = 4
2xi — 5x2 + 2x3 = 1
5xi + 16x2 — 7x3 = 10.
Show, by reducing this matrix to its echelon equivalent, that these equations
are consistent and solve them.
9-54 Solve the following equations, in which a is an arbitrary constant, by reducing
the augmented matrix to echelon form :
Xl + <XX2 + <XX3 = 1
OXl + X2 + 2aX3 = —4
axi — 0x2 + 4x3 = 2.
Consider the effect of a on the solution.
PROBLEMS / 441
9-55 Solve the following homogeneous equations, in which a is an arbitrary con-
stant, by reducing the augmented matrix to echelon form:
<xxi — xi — X3 =
— Xl + aX2 — xs =
— Xl — X2 + 0CX3 = 0.
Consider the effect of a on the solution.
9-56 Solve the following equations using Gaussian elimination :
l-202;ci - 4-371*2 + 0-651x3 = 19-447
-3-141xi + 2-243x2 - l-626x 3 = -13-702
0-268x1 - 0-876x2 + l-341x 3 = 6-849.
9-57 Discuss briefly, but do not solve, the following sets of equations:
(a) xi + x 2 = 1 (b) xi + x 2 = 1
2xi - x 2 = 5; 2xi - x 2 = 5
xi — X2 = 0;
(C) Xl + X2 = 1 (d) Xl + X2 - X3 =
2xi - x 2 = 5 2xi - x 2 - 5x3 = 0.
— xi — 2x2 = 0;
Section 9-8
9-58 Write down the characteristic equations for the following matrices:
(a) A =
"1 4"
1 1
■
(b) A =
"1 2"
2 11.
.0 2 1_
9-59 Find the eigenvalues and eigenvectors of
A =
' 1
-2
-r
0.
9-60 Prove that the eigenvalues of a diagonal matrix of any order are given by the
elements on the leading diagonal. What form do the eigenvectors take.
Section 9-9
9-61 Verify that the matrix A in Eqn (9-46) is orthogonal, and justify the assertion
that X = A X describes the effect of a general rotation of the rectangular
cartesian axes 0{xi, X2, X3}.
9-62 Justify the name reflection transformation of the plane when applied to a
transformation of the form
X = AX,
where either
A =
-1 01
.0 -lj
or A =
-1 01
lj
442 / LINEAR TRANSFORMATIONS AND MATRICES
CH 9
9-63 Show that if
X = AX,
Where
A =
"cos 6
_sin 6
—sin 6'
cos 9_
then
XX =
■- XX,
where the prime signifies the transpose operation. Interpret this result geo*
metrically.
9-64 If
X =
~x~
, x =
~x~
.y.
J-
„ , and A =
"_1_ _ J_
V2 V2
1 _1_
-V2 V2J
deduce the image of the curve y = x 2 under the transformation
X = AX.
Is the shape of the curve changed ?
9-65 If
X =
deduce the image of the curve y — x 2 '+ 2x + 1 under the transformation
X = AX.
Describe the effect of the transformation in geometrical terms.
x~
, x =
~x~
, and A =
"-3 0'
J-
J-
•0 3.
Section 9- 10
9-66 How many dimensionless groups of variables (w variables) characterize a
physical situation described by:
(a) the four physical quantities: work (L 2 MT~ 2 ), viscosity (L^MT' 1 ),
pressure (,L~ l MT~ 2 ) and mass transfer rate (Mr 1 );
(b) the five physical quantities: length (L), viscosity (L^MT' 1 ), velocity
(LT- 1 ), area (Z. 2 ) and pressure (L^MT' 2 ).
9-67 Express in matrix form the relationship between the differentials dx, dy and
d«, dv, given that
u = sinh (x 3 + y 3 ), v = cosh (x 3 — y 3 ).
For what values of x and y does this transformation fail to have an inverse?
PROBLEMS / 443
9-68 Given that
u = x 2 + ly + 1, v = X s - Ixy + y 3
and
p = sin (u + v), q — cos (« — v),
display in matrix form the relationship between the differentials dx, dy and
d«, dv and between d«, dy and dp, dq. Use matrix multiplication to express
directly the relationship between the differentials dx, dy and dp, dq.
9-69 Justify the matrix equations (9-55) and (9-56).
9-70 Verify that the square matrix in (9-55) is an orthogonal matrix.
9-71 Perform the calculations required in (9-59) to give the transformed stress
tensor components (9-60).
Functions of a complex
variable
10-1 Sequences of complex numbers and limits
When considering a definition of a sequence {z n } of complex numbers, we
should first examine to what extent the work of Chapter 3 on sequences of
real numbers is still relevant to complex sequences.
It will obviously be necessary to formulate new definitions, and this will
be our next task. However, since a sequence {u„} of real numbers is just a
special case of a sequence {z„} of complex numbers, any new definitions must
be compatible with the corresponding situations in Chapter 3 when related to
real sequences. Therefore, the behaviour of sequences of complex numbers
will be directly determined by the behaviour of the sequences of real numbers
that may be formed by considering separately the real and imaginary parts
of {z n }. Thus if z n = [1 + (l/«)] + /(l/« 2 ) we would need to consider the
two real sequences {1 + (l/«)} and {l/« 2 } associated with {z n }.
Here we must note that expressions such as 'monotonic', 'finitely oscil-
lating', and 'bounded above' cannot be applied to sequences of complex
numbers as they cannot be ordered like the real numbers.
definition 10-1 (limit of complex sequence) The infinite sequence
{z„} of complex numbers z n = x n + iy n will be said to converge or tend to the
limit y = /j, + iv if, and only if, for every e > there exists a number TV,
such that for n> N,
| y — z n I < e.
When the sequence {z n } is convergent to y in this sense we shall write
lim z n — y.
This definition is easily seen to reduce to Definition 3-3 when applied to a
sequence of real numbers, for then the complex modulus and the absolute
value become identical in meaning.
The essential difference between Definitions 3-3 and 101 is embodied in
the following theorem.
theorem 10-1 (conditions for convergence) Let {z n } be an infinite sequence
SEC 10-1 SEQUENCES OF COMPLEX NUMBERS AND LIMITS / 445
of complex numbers z n = x n + iy n - Then necessary and sufficient conditions
for
lim z n = y,
n— *°o
where y = ju + iv, are that
lim x n = n and lim y n = v.
Proof A paraphrase of this theorem would be that if {z„} converges to y,
then the sequence of the real parts of {z„} converges to the real part of y and
the sequence of the imaginary parts of {z n } converges to the imaginary part
ofy. To establish the necessity of the conditions of the theorem suppose that
for some positive number e, \ y — z n | < e for n > N. Then
I V ~ z n | = | H + iv - (x„ + iy n ) \ = | (fi - x„) + i(v - y n ) \,
amd so by the definition of the modulus of a complex number,
(y - -n) - [(// - xn) 2 + (v- y n y}v*.
Neglecting first the positive term (/i - x n ) 2 , and then the positive term
— }'n) 2 , shows that
I y — z n | > | fi - x n | and \y - z n \ >\ v - y n \.
Hence | p - x n | < e and | v - y n \ < e for n > N showing, by virtue of
Definition 3-3, that
lim x n = fi and lim y n = v.
n-»oo n— cc
The sufficiency of these conditions is almost immediate. If
lim x n = /J, and lim y n = v,
then for any positive e choose N such that \ p — x n \ < e and | v — y„ \ < e
for n> N. Then, as
\y- Zn \ = [(fl- X„) 2 + (v- J„)2]l/2,
it follows that
I y - z n | < V(2« 2 ) = £^2.
This establishes our result because e was arbitrary and so \y — z n \ can
always be made arbitrarily small by a suitable choice of e.
The fact that a sequence of real numbers can only have one limit implies
the uniqueness of p and v, and hence the uniqueness of y. Consequently we
have arrived at the following result.
Corollary 10-1 If the sequence {z„} of complex numbers is convergent, then
446 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
it only has one limit.
Example 10-1 Examine each of the following sequences {z n } of complex
numbers for convergence and, where appropriate, determine the limit.
(a) z„ = I 1 + -I + *' sin y
2
(c) z n = n sin - + iV n W( n + 6) — \/n].
n
Solutions We shall obtain our results by means of a direct application of
Theorem 10-1.
(a) Making the identifications
1 mr
x n = 1 + - y n = sin —
n 2
we see that limx» = 1, whereas the sequence {y n } has no limit since y n
n— *-oo
assumes successively only the three values 1, 0, and — 1. Hence the sequence
{z n } does not converge and so has no limit.
(b) Making the identifications
2n + 1 /n-\Y
-W
x n - -3^-, y
we see that
lim x n = f and lim y n = 1 .
n—*oo n~*<x>
Hence the sequence {z„} converges and
lim z n = f + /.
n— *-oo
As the numbers f and 1 are not members of their defining sequences {x n }
and {y n }, the complex limit y is not included as a member of the sequence
(c) Make the identifications
. 2
x n = n sin -, j„ = \/nW(n + 6) — \/n].
Then
lim ;r„ = 2
SEC 10-1 SEQUENCES OF COMPLEX NUMBERS AND LIMITS / 447
and
lim y„ = lim \/n . \/n
= lim«
1 +
= 3.
Thus the sequence {z n } converges and
lim z n = 2 + 3i.
n-*oo
For the same reason as in (b) above, the limit 2 + 3/ is not a member of the
sequence {z n }.
Arguments essentially similar to those given in Theorem 101 establish
results from complex sequences that are strictly analogous to those of
Theorem 3-1. We state them below without proof.
theorem 10-2 If it can be shown that {w„} and {z„} are two convergent
sequences of complex numbers with lim w n = X and lim z n = y, then
n— f oo n— *■ oo
(a) wi + z\, wz + Z2, wz + Z3, . . . is a sequence such that
lim(w„ + z„) = X + y;
n—*oo
(b) w\Z\, W2Z2, W3Z3, ... is a sequence such that
lim w n z n = Xy ;
n— *oo
(c) provided y =£ 0, w\\z\, wzjzz, W3/Z3, ... is a sequence such that
Km (5) - X,r.
Example 10-2 If w„ = [«(1 + /)/(« + 1)] and z n = (1/n) + [(n 2 + 1)/
(2« 2 + 3)]/, find (a) lim (w n + z„); (b) lim (w n z n ); and (b) lim (w n jz n ).
n-*oo n-*oo n—>oo
Solution By inspection we have
lim w„ = 1 + i and lim z» = \i.
n— >co n~+oo
Hence by Theorem 10-2,
(a) lim (w n + z») = (1 + i) + Ji = 1 + ^;
n— *-co Z
(b) lim (w n z n ) = (1 + 0J/ = 10' - 1);
448 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
(c)lta (S)_l+.'_ 2 _ 2l
n— oo \ z n I f«
If the terms of a sequence are plotted as points in the complex plane,
Theorem 10-1 and its Corollary imply that when that sequence is convergent,
it will have the property that with increasing n its points will cluster ever
closer to the one point that represents its limit y. In other cases it may happen
that although a sequence is not convergent, nevertheless when its terms are
plotted in the complex plane they cluster around two or more distinct points.
By analogy with sequences of real numbers, these points will be called limit
points of the complex sequence, and for their definition they require the
notion of a neighbourhood of a point.
Accordingly, we shall use the term a neighbourhood of the point £ in the
complex plane to mean the interior of any circle centred on £. This idea
enables us to define a limit point.
definition 10-2 (limit point) The point £ will be called a limit point of
the sequence {z n } of complex numbers if every neighbourhood of £ contains
at least one point of {z„} other than the point £ itself.
It is an immediate consequence of this definition that every neighbourhood
of a limit point £ of {z„} contains an infinite number of points of {z„}. We
again emphasize that Theorem 10-1 together with its Corollary imply that a
convergent sequence {z n } of complex numbers can have only one limit point.
Example 10-3 Identify the limit points of the sequence {z n } where
z ._ ( 2 _ I) +/(_!). (l + l 8in ^).
Solution Make the identifications
1 / 1 . mr\
x„ = 2-- and y„ = (~\) n 1 + - sin— •
n \ n 2 1
Then {.\ n } converges to the limit 2 and thus has one limit point, whilst {y n }
does not converge but has the two limit points 1 and — 1. Hence the sequence
{z„} has the two limit points 2 + i and 2 — /.
10-2 Curves and regions
The notions of a curve and a region in the real plane may be immediately
extended to the complex plane. As a closed and not necessarily smooth curve
is a connected set of points which serves to de-limit two areas of the plane,
which we shall call the interior and exterior regions relative to that curve, we
SEC 10-2
CURVES AND REGIONS / 449
ought first to define a curve C in the complex plane. It is frequently convenient
to give a parametric representation by expressing C as the set of points
z = x(s) + iy(s) fora<s<b, (10-1)
where x(s) and y(s) are continuous real functions of the parameter s. It
should be apparent from Section 2-5 and subsequent work that the require-
ment of continuity for the real functions x(s), y(s) will ensure that C is a
continuous curve (that is, unbroken), but that it does not necessarily possess
a tangent at every point. As a simple illustration C might be a rectangle, for
then tangents would not be defined at the corners though the curve would be
continuous everywhere. We shall return to these general matters later when
a continuous function of a complex variable has been defined. For
conciseness let us henceforth call such curves C, continuous curves.
For a less trivial example, suppose that the curve C in the complex plane
is defined by z = x(s) + iy(s), where
x(s) = sin s for — \n < s < — .
[sin 2 s for— in < s < I
y(s) =
for \n < s <
3tt
Fig. 10-1 Continuous curve C having no tangent defined at P and Q.
Then it is readily seen that C is the continuous closed curve comprising the
parabola y = ** in the interval - 1 < x < 1, together with the points of the
line y = 1 common to that same interval. The curve C is shown in Fig. 10-1
and it is continuous everywhere, though it is not smooth everywhere for no
tangent can be defined at points P and Q. The darkly shaded area in that
450 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
Figure comprises points which are interior relative to C and form the interior
region, whilst the lightly shaded area comprises points which are exterior
relative to C and form the exterior region. When speaking in terms of regions,
the points comprising the curve C itself are usually called the boundary
points and they may, or may not, belong to a region.
A parametric representation of a curve C is not always the most convenient
method for its description in the complex plane and, on occasions, it is better
to identify the points z comprising a curve directly in terms of z itself. When
necessary, regions are usually defined in the complex plane by means of a
combination of curves and inequalities, as was done in the real plane.
Example 10-4 Describe the curve C defined by the equation
U-2|=|
and use the result to define the region exterior to C.
Solution This expression defines a connected set of points that all have a
modulus 3/2 relative to the point z = 2 as origin, that is to say, the set of
points which are all distant 3/2 from the point z = 2. Hence the equation
| z — 2 | =3/2 describes a circle C of radius 3/2 centred on the point z = 2.
Algebraically, the same result is obtained by writing z = x + iy, when
| z — 2 | = | (x — 2) + iy |, so that from the definition of the modulus of a
complex number, | z — 2 | = 3/2 is seen to be equivalent to the algebraic
equation (x — 2) 2 + y 2 = 9/4. This is a circle of radius 3/2 centred on the
point (2, 0). The region exterior to C is the entire complex plane less the
points lying in and on this circle.
Example 10-5 Describe the region interior to and including the curve C
defined by
arg (z - 1) - arg (z - /) = \n,
and also satisfying the inequalities
i < Re z < f and Im z > 0.
Solution Consider the construction in Fig. 10-2 (a) in which P is the point
z = 1, Q is the point z = i and R is a general point z.
Simple geometrical arguments then establish that the angle y is related
to the angles a and /? by the equation
y = 77 + a — /S.
However, the line PR is the vector z — 1, whilst the line QR is the vector
z — /, so that arg (z — ;') = a and arg (z — 1) = /?. Since by the conditions
of the problem we must have /3 — ot = \-n, it follows that y = \tt. The angle
QRP is thus a right angle and hence the curve C must be a semi-circle drawn
SEC 10-2
CURVES AND REGIONS / 451
A
k
y Complex plane
f)
<^f\<l
M
/HA
p
x>
o
Complex plane
\
I
H
(a)
i I I
(b)
Fig. 10-2 Region in complex plane: (a) boundary curve; (b) region interior to C
and satisfying stated inequalities.
from P to Q with PQ as its diameter. The semi-circle must lie above the
diameter PQ, since were.the general point R to be taken below that line the
equation relating the arguments would no longer be satisfied. To define
the lower semi-circle the following condition would be needed :
arg - 1) - arg (z - = -\n.
To complete the solution to the problem it is now necessary to interpret
the inequalities. The inequality \ < Re z < f describes the narrow strip
bounded by the lines x = J and x = f , with the points of the line x = J
excluded from consideration. The inequality Im z > is the half plane above
and including the x-axis itself. Figure 10-2 (b) presents a composite diagram
with the shaded area representing the region satisfying all the conditions of
the problem. Boundary points belonging to the region are indicated by a
heavy line and those excluded by a dotted line.
Notice from this and the previous example that there is more than one
way of specifying a given curve and region. The condition
arg (z - 1) - arg (z - i) = \-n
is an alternative expression of the condition
\z-\-¥\ = ^Y
with Re z > 0, Im z > 0,
which, in turn, is an alternative expression of the algebraic condition
(x - W + (y-h) 2 = h
with x > 0, y > 0.
452 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
10-3 Function of a complex variable, limits, and
continuity
In Chapter 2 we used the term 'a real valued function of a real variable' to
mean any rule that associates with each real number from the domain of
definition of the function a unique real number from the range of that func-
tion. Symbolically, if D denotes the set of points in the domain of a function
/, and R denotes the set of points in the range of/, this relationship or mapping
is given by
R = /(/>).
These ideas still hold good when the domain D and the range R include
complex numbers. Thus if z is any point in D, and w is the unique number
assigned to z by the function/, we write
w=f(z). (10-2)
The number z = x + iy is allowed to assume any value in D and so, if
desired, could be called a complex independent variable, when w could then
properly be called a complex dependent variable. Usually we shall simply
refer to z and w as complex variables. It must be appreciated that, like z, the
variable w has a real part and an imaginary part, both of which are in general
dependent on x and y through the variable z = x + iy. We summarize these
ideas formally as follows.
definition 10-3 (function of a complex variable) We shall say that/is a
function of the complex variable z = x + iy, and write
w =/(z),
if / associates a unique complex number w = u + iv with each complex
number z belonging to some region D of the complex plane.
Specific examples of functions of a complex variable are :
(a) w = iz + 1 ; (b) w = zz; (c) w = z 2 + 2z + 1 ; (d) w = l/(z - 2);
(e) w = sin z.
With the exception of (d), which is not defined for z = 2, these functions are
defined for all z.
The difference between a function of a complex variable and a real valued
function of a real variable is made clear by expressing these examples in real
and imaginary form. Thus writing z = x + iy and w = u + iv we find:
(a) w = i(x + iy) + 1 = (1 — y) + ix, showing that u = 1 — y, v = x;
(b) w = (x + iy)(x — iy) = x 2 + y 2 , showing that u = x 2 + y 2 , v = 0.
This is an example of a function that always maps a complex variable
into a real variable.
SEC 10-3 LIMITS, CONTINUITY / 453
(c) w = (x + iy) 2 + 2(x + iy) + 1 = (x 2 + 2x - y 2 + 1) + i(2y +
2xy), showing that u = x 2 + 2x - y 2 + 1, v = 2y(l + x);
(d) w = l/(x + *> - 2) = [(x - 2) - »>]/(x 2 + J 2 - 4x + 4), showing
that u = (x - 2)/(x 2 +yz-4 x + 4),v= -y/(x 2 + y 2 - Ax + 4), pro-
vided only that x =£ 2 and y ^ 0;
(e) w = sin z = sin (x + /j) = sin x cos iy + cos x sin i)>, and so using
the results of Problem 6-33, that cos iy = cosh y, sin iy — i sinh y,
we arrive at w = sin x cosh y + i cos x sinh y. Thus in this case
« = sin x cosh y, v = cos x sinh y.
Any function of x, y and complex constants that gives rise to a unique
complex number when x and y are specified defines a function of the complex
variable z by virtue of the relationship z = x + iy. For suppose that
(x+y+l) + i(x-2y)=f(z),
then to determine /(z) when z = 1 + 2i we simply write x + iy = I + 2i,
showing that x = 1, y = 2, after which it follows from the form of /(z)
that/(l + 2/) = 4 - 3/.
Our Definition 10-1 of a limit of a sequence of complex numbers extends
without difficulty to include the concept of a limit of a function of a complex
variable. In essence, we shall say that/(z) has the limit w as z -> Zo and will
write
lim/(z) = h'o (10-3)
z—zo
when, for any small e > 0, we can always ensure that |/(z) — w \ < e by
confining z to some suitably small circular neighbourhood \ z — z \ < d
of the point z . That is to say/(z) can be made arbitrarily close to w by
taking z sufficiently close to z , irrespective of the manner of approach of
z to z . As in the real variable case, we do not require that/(z) be defined at
zo or, if it is, that/(z ) should equal w . Expressed formally this becomes:
definition 10-4 (limit of a function of a complex variable) The function
/(z) will be said to tend to the limit w as z ->■ z , and we shall write
lim/(z) = wo,
if, and only if, for any e > there exists a d > such that
|/(z) - w | < e when | z - z | < 6 with z ^ z .
This form of statement should be compared with that in Definition 3-8
relating to a real valued function of two real variables. There is no essential
difference, since the complex modulus is equal to the distance function p
used in that definition.
454 / FUNCTIONS OF A COMPLEX VARIABLE
CH 10
Example 106 Prove that
lim (z 2 + 1) = 1 + 2/.
z-»l + i
Solution The result is self-evident since the function /(z) = z 2 + 1 is
uniquely denned for all z, but let us prove it using Definition 10-4.
|/(z) - (1 + 2i) | = | z 2 - 2/ 1 = | [z - (1 + OH* + (1 + 0] I
and from the properties of the modulus this becomes
|/(z)-(l +2i)| = |z- 1 -
= |z-l-
<|z-l-
. I z + 1 + I I
. I z - 1 - i + 2(1 + |
{ | z - 1 - / I + 2 | 1 + / 1 }.
Hence we may make |/(z) — (1 + 2/) | < e, where e > is arbitrarily
small, provided that we choose the number 6 > such that | z — 1 — / 1 < d
and d{d + 2 | 1 + / 1 } < e. The conditions of Definition 10-4 are satisfied,
thereby establishing that 1 + 2/ is the limit. In other words, as z approaches
the value 1 + /, so the function f(z) = z 2 + 1 approaches the number
1 + 2/, which is its limit. In this case it also happens to be true that lim/(z)
= /(zo).
Z-»Z
Example 10-7 Prove that
z^2i \Z — 2j/
Solution Unlike the previous situation, the function /(z) = (z 2 + 4)/(z — 20
is not defined when z = 2/'. To establish the desired result we notice that
|/(z)-4/| =
z 2 + 4 _ 4/ z
- 8
z-2/
z 2 — 4/z — 4
=
(z - 2/) 2
z-2/
z-2/
= I z - 2/ 1
Thus we can ensure that | /(z) — 4/ 1 < e by taking | z — 2/ 1 < d, where
here d = e. The conditions of Definition 104 are satisfied, and thus we have
established that
despite the fact that the function f{z) = (z 2 + 4)/(z — 2/) is not defined at
z = 2/.
The results of Theorem 10-2 generalize to give limit theorems for functions
of a complex variable.
SEC 10 ' 3 LIMITS, CONTINUITY / 455
theorem 10-3 (operations on limits of complex functions) If /(z) and
g(z) are two complex functions for which
lim/(z) = v and lim^(z) = u' ,
z—zq z—zq
then
(a) lim [f(z) + g(z)] = v + „■„;
Z-Z
(b) lim/(z) 5 -(z) = y H ;
z—zo
(c) provided w ^ 0, lim [f(z)]/[g(z)] = v /w .
z—z
The proofs of these results follow directly from Definition 10-4 and are
left to the reader.
Example 10-8 Apply the results of Theorem 10-3 to the functions /(z) =
z 2 + 2z + 1 and g(z) = 1 - iz to determine the limits of /(z) + g(z),
f{z) g{z), and f(z)/g(z) as z — /.
Solution The functions/(z) and g(z) are defined for all z and so it is easily
seen that
lim/(z) = lim (z 2 + 2z + 1) = 2i
z-*i z—i
and
lim g(z) = lim (1 - iz) = 2.
z—i z—i
These results, which have been obtained by direct substitution, may be verified
by using Definition 10-4, as in Example 10-3. Results (a), (b), and (c) of
Theorem 10-3 may thus be applied to yield:
(a) lim [f(z)+g(z)] = 2(1+/);
z—i
(b)limf(z)g(z) = 4i;
(c) as lim g(z) = 2^0, lim ^ = /.
z-i z^i g(z)
It is now a simple step to extend the idea of continuity for, as with real
valued functions of a real variable (c.f. Definition 3-9), we shall say that the
function /(r) is continuous at z if lim/(z) = w exists and/(z ) = h . We
z—z
thus arrive at the following statement.
-*0
definition 10-5 (continuity of a function of a complex variable) The
complex function /(z) will be said to be continuous at z if:
456 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
(a) Iim/(z) = it'o exists,
and
(b) /(z ) = no.
A complex function will be said to be continuous in a region of the complex
plane if it is continuous at all points of that region.
Example 10-9 Prove that the function /(z) = a + bz is continuous
everywhere.
Solution If zo is any complex number we have
|/(z) -/(zo) I = I a + bz - a - bz | == ] b || z - z |,
so that for any e > 0,
|/(z)-/(z )| <e if \z-z \<d
provided that we take 8 = e/\ b \. We have proved that
lim/(z) = a + bzo,
Z-'ZO
which is condition (a) of Definition 10-5. Condition (b) is obviously true as
f(zo) = a + bzo for all z . As zo was arbitrary, it follows that we have
proved the required property of continuity for/(z). Notice that by first setting
b = and then setting a = 0, b = 1, the continuity of the functions /(z) = a
(constant) and/(z) = z follow as special cases.
Example 10-10 Prove that the function f n (z) = z n , where n is a positive
integer, is continuous for all z.
Solution The proof is by induction. In the previous example we proved as a
special case the continuity of/i(z) = z. If we assume that/ m (z) is continuous,
then since f m +i(z) = z m+1 = z . z m =/i(z) .f m (z), it follows directly from
Theorem 10-3 (b) that/ m+ i(z) is continuous. Thus if P{m) is the property that
f m {z) is continuous, we have proved directly that P{\) is true and also that if
P(m) is true, then so also is P(m + 1). Hence it follows by induction that
P(m) is true for all m, which establishes our result.
Further use of Definition 10-5 coupled with Theorem 10-3 makes it a
straightforward matter to establish many other important and useful results
concerning continuity. Typical of results that follow from such reasoning
are that a complex polynomial
P(z) = a + aiz + a 2 z 2 + • • • + a„z n
is continuous everywhere, whilst a complex rational function
SE C 10-3 LIMITS, CONTINUITY / 457
a + a\z + a 2 z 2 + • ■ • + cimz™
m.=
bo + biz + b 2 z 2 + • • • + b n z n
is continuous everywhere except at the n zeros of the denominator.
It is interesting to give an alternative proof of the continuous nature of a
polynomial P(z). As z = x + iy, it follows that we may express P(z) in the
form
P(z)= Qi(x,y) + iQ 2 (x,y),
where Qi(x, y) and Q 2 (x, y) are real polynomial functions each with general
terms of the form x s y l in which s, t are either zero or positive integers. Now
from the behaviour of real functions of two real variables we know that
Qi(x, y) and Q 2 (x, y) must be continuous functions of x and y everywhere
in the plane.
However, if n and z 2 are any two points with zi = xi + iy x and z 2 =
xi + iy 2 , then
| P(z 2 ) - P(zi) | = | Qi(x 2 ,y 2 ) - 2i(.xi, ji) + i[Q 2 (x 2 ,y t ) - Q 2 (x u yi)] |
< I Qi(x 2 ,y 2 ) - Qi(x uyi ) | + | Q 2 ( X2 , y 2 ) - Q 2 ( Xl , yi ) |.
Now as Qi(x,y) and Q 2 (x, y) are continuous, it is true that
lim Qi(x 2 , J2) = Qi(x h ji) and lim Q 2 (x 2 , v 2 ) = Q 2 (x u y{),
vi-*vi v2—yi
and so | P(z 2 ) — P(z{) \ may be made arbitrarily small by taking z 2 sufficiently
close to z\. This proves our assertion of the continuity of P(z) for all z, since
zi, z 2 were arbitrary points in the complex plane.
Obvious extensions of the other continuity theorems proved for real
variables are also possible and the most useful ones are summarized below
without further proof.
theorem 10-4 (continuity theorem for complex functions) If/(z) and
g(z) are two complex functions each continuous at z =z , then
(a) f(z) + g(z) is continuous at z ;
(b) /(z) g(z) is continuous at z ;
(c) f(z)jg(z) is continuous at z provided g(z ) ^ 0;
(d) if/(w) is continuous at w = n , and vv = g{z) is continuous at z = z ,
with H'o = g(z ), then the composite function (function of a function)
f[g ( z )] is continuous at z = zq.
It is, for example, condition (d) of this theorem that validates the assertion
that (z 2 + 3z + 2) 3 is continuous everywhere. (Why ?)
458 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
10-4 Derivatives — Cauchy-Riemann equations
Thus far the related concepts of a limit, a function, and of continuity
have been successfully extended to include a function of a complex variable.
It is now reasonable to attempt to generalize the notion of a derivative, and
at this point we encounter a major dissimilarity between a function of a
complex variable and a real valued function of two real variables. Indeed,
whereas we have already seen that most real valued functions of two real
variables are partially differentiable with respect to those variables, it will
shortly be shown that the operation of differentiation can only be defined
for a very special class of complex functions. Before discovering the exact
nature of the restriction on a complex function if it is to be differentiable,
we must extend our definition of a derivative in a manner compatible with
the real variable case.
definition 10-6 (derivative of a complex function) Let w =f(z) be
defined in some neighbourhood of the point z = zo and let | h | be sufficiently
small for z = zo + h to lie within this neighbourhood. Then, if the difference
quotient
f( Zo + h) - f(z )
tends to the limit y as | h | ->■ 0, we shall call y the derivative of/(z) at zo and
will write either
/'(zo) = y or
dw
dz
= y-
Z = 20
If this difference quotient has a limit for all points zo of some region in which
h' = f(z) is defined, then /(z) will be said to be differentiable in that region.
The derivative, as a function of a general point z, will be denoted either by
f'(z) or dw/dz.
Alternatively expressed, this definition asserts that the complex number
y is the derivative of w = /(z) at z = zo if, for every e > 0, there exists a 6
such that
/(ZQ + h) -/(zo)
y
<e for I h I < <5.
Notice that although h is small when | h \ is small, the condition | h | — >-
that is imposed in our definition of a derivative requires the limit defining
the derivative to exist for all possible methods of approach of h towards zero.
This means that if the derivative is to exist, then it must be independent of
the manner in which h-*0. This is a vitally important feature of the definition
and one to which we shall return.
SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 459
Example 10-11 Prove that if n> is an integer and w = z", then
chv
d7
for all z.
= nz n ~ x
Solution Consider some point zo and form the difference quotient
(z + h) n - z n
, — _,
h
where It is any complex number. Then by the binomial theorem
(ZQ + h) n - Zq"
h
zo" + ftfoo"- 1 + [n(n - l)]/2! ft 2 z "- 2 + • • • + h n - z »
and thus
(zo + h)« - zo" , n(« - 1)
Now as | /? | ->■ implies A -»• 0, taking the limit of this expression as | h \ ->
we arrive at the derivative of the function w = z« at the point z :
lim
1*1-0
"(zo + h) n - zo"'
nzo"- 1 .
Since the point z was arbitrary this result is true for all z , and so the
function is differentiable for all z and
d(z») _ ,
— — = nz n ~ x .
dz
A more subtle argument shows that this result is, in fact, true for any value
of n and not just for n a positive integer.
Example 10*12 Prove that if w = sin z, then
dvv
— = cos z
dz
for all z.
Solution Let z be any value of z and form the difference quotient
sin (zo + A) — sin zo
460 / FUNCTIONS OF A COMPLEX VARIABLE
CH 10
where h is any complex number. Then using a familiar trigonometric identity
we have
sin (zo + ft) — sin zo sin zo cos h + cos zo sin h — sin zo
^sin/A /l
= cos zo | —7- I — sin zo I —
/l — cos h\
zo \—r-y
Now u' =/(z) will be differentiable if the limit of the right-hand side of this
expression can be shown to exist. This is most easily done by utilizing the
formal power series expansions for the sine and cosine functions, which show
that
and
sin h 1
~h ' = h
h-
h» h 5
3i + 5! + -
= 1
A 2 /t 4
_ 3~! + 5!
—
1
1 — cos h
h
1
~h
r A 2
. 1 - 1+ 2!
/? 4
-4! + -
-1
= h
.2!
h* 1
It is clear from these that because | h \ — >■ implies h —*■ 0, then
/sin h\ 1 1 — cos h\
hm —— =1, hm -—
ui-*0 \ ft I I /1 l-*o \ n
= 0.
Returning to our problem, taking the limit of the difference quotient as
I h I — *■ and using the above limits gives for the derivative of if = sin z at
the point zo the result
lim (
I ft M) \
sin (zo + h) — sin zo
= COS Zq.
Once again, as zo was arbitrary, we have shown that w = sin z is differ-
entiable for all z, so we may write
dz
(sin z) = cos z.
Alternative derivations of the two limits involved in this example are indicated
in Problems 10-22 and 10-23.
The following theorem is an obvious extension of Theorems 5-4 to 5-8
relating to the real variable case.
theorem 10-5 (rules of differentiation) If/, g are differentiable functions
in some region, then throughout that region :
(a) -*- [/(z) + g(z)] = $f + -? (Derivative of Sum);
dz dz dz
SEC 10-4
DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 461
d de df
(b) dz [ ^ Z) ^ (Z)] = f® df + ?(z) dl ( Derivative of Product) ;
(c)
dz
U(z).
g(z)(d//dz) -/(z)(dg/dz)
^-j provided g(z) ^
(Derivative of Quotient) ;
d
( d ) falfigi 2 )]} = f'[g(z)]g'(z) or, by writing u = g(z), this takes the
form - {f[g(z)]} = ~ ^ (Chain Rule);
(e) /(z) and g(z) are continuous functions of z (Differentiability implies
Continuity).
Proof All these results may be established directly from the definition of a
derivative by arguments that are essentially similar to the real variable case.
We give the proofs of (a) and (e) as illustrations.
Result (a) follows because
az |/i|-*0
= lim [■
|A|-0 L
7(z + A) + g(z + h) -/(z) - g(zy
h) -f{z)
+ lim
UI-0
g(z + h)- g (zy
dz dz
Result (e) follows because differentiability of a function /(z) requires the
difference quotient [/(z + h) — f{z)]jh to have a limit as | h \ ->■ 0, which in
turn requires that \f(z + h) — f(z) | ->- as | h \ ->■ 0. This is just the formal
statement that/(z) is continuous and so our assertion is proved.
Example 1013 Use the derivatives established so far together with Theorem
10-5 to differentiate the functions :
(a) w = z 2 + 3 sinz;
(b) w = z 3 sin z;
(c) w = 1/(1 + z);
(d) w = sin (z 2 + z + 3).
Solution
(a) Using Theorem 10-5 (a) with/(z) = z 2 , g(z) = 3 sin z we obtain
dw
— - = 2z + 3 cos z
dz
for all z.
462 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
(b) Using Theorem 10-5 (b) with/(z) = z 3 , g(z) = sin z we obtain
— - = z A cos z + 5z l sin z
dz
for all z.
(c) Using Theorem 10-5 (c) with/(z) = 1, g(z) = (1 + z) we obtain
Aw —1
dz = (1 + z) 2
forz ^ 1.
(d) Writing w = sin w, where u = z 2 + z + 3 enables us to apply the
chain rule (Theorem 10-5 (d)):
d
— - (sin u)
QU
du
— = (cos u)(2z + 1),
dz
dw
dz"'
whence
— =(2z+ 1) cos (z 2 + z + 3).
dz
Let us now explore more carefully the implications of the requirements of
differentiability. This is perhaps best prefaced by an illustration of a simple
function of a complex variable that is not differentiable.
We shall attempt to compute the derivative at zo = of the function
/(z) = z, where z = x + iy. We have f(z) = x — iy, from which it follows
that /(0) = 0, so that in computing the required derivative we are led to
consider the behaviour of the difference quotient
/(0 + h) -/(0) = f(h) -Q = h
h h h
as | h | — >• 0. Writing h = a + ifl this becomes
h a - //9 a 2 - /5 2
h a + //? a 2 + ft*
_ ,/ 2*/? \
' U 2 + ^f
Obviously this expression can have no limit as | h \ —*■ because the result is
dependent on the manner of approach of h to zero. To see this we need take
only two special cases:
(a) if a = 0, and j8 -»- 0, then
/3->0
SEC 10-4
DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 463
whereas (b) if /? = 0, a ■
■ 0, then
lim
a-0 \h
fi =
= 1.
The limit thus depends on the manner in which h ->■ so that no derivative
exists in the sense of Definition 10-6.
Obviously some conditions must be devised such that it is possible to
decide, without appeal to Definition 10-6, whether or not a given function
f{z) has a unique derivative— that is, whether or not the limit of the difference
quotient in Definition 10-6 is independent of the manner in which h -*■ 0.
Consider a function /(z), assumed to be differentiate in some region,
and express it in the form
/(z) = u + iv, ( 10 -4)
where u, v are functions of x and / by virtue of the relationship z = x + iy.
(Cf. the illustrative examples (a) to (e) following Definition 10-3.) Let us now
compute the derivative of/(z) and, in doing so, appeal to Fig. 10-3.
Complex plane
Fig. 10-3 Derivative of a complex function.
As/(z) is assumed to be differentiable, we shall choose an arbitrary h as
shown in the Figure and allow it to tend to zero along the line QP inclined
at an angle a to the x-axis. Then if h = X + ifi, it follows that z + h =
x + X + i(y + (i), and so if we also make use of the alternative representation
of h in the form h = | h \ e ia , where | h \ = (A 2 + /a 2 ) 1 ' 2 , we have
f/(z + h) -f{z)~\ ,._ V f[(x + X) + i(y + ,£)]- f{ x + ,»"]
i L
f\z) = lim
I h |-0
= lim
l*l-o
or
f'(z) = e
"lim
|*|-0
(X 2 + /*2)i/2 e ia
u(x + X, y + ju) + iv(x + X,y + ju)- u(x, y) - iv(x, y)~
(A 2 + /*2)l/2
(10-5)
464 / FUNCTIONS OF A COMPLEX VARIABLE
CH 10
As/(z) is assumed to be differentiable, result (10-5) must be independent of
the angle a. To see the implications of this let us first consider the real part
of the bracketed expression inside the limit (10-5) which is
u{x + K y + ji) — u(x, y)
(A 2 + /z 2 )i/2
(10-6)
By adding to and subtracting from the numerator of this expression the term
u(x, y + fj), it is soon verified, with a little manipulation, that it is equivalent
to
tu{x + A, y + n) — u(x, y + fi)
+
(A 2 + Z* 2 ) 1 / 2
u(x, y + fi)- u(x, y)\
P
/"
(A 2 + /l 2 )l/2
(10-7)
Geometry tells us that
A
= cos a,
/"
= sin a
(A 2 + /i 2 ) 1 ' 2 ' (A 2 + /< 2 )i/2
so, when taken in conjunction with the fact that | h | -»■ implies X -> 0,
fi-*0, the limit of expression (10-7) as | h \ ->• becomes
du du .
— cos a H sin a.
8x By
(10-8)
An identical argument applied to the imaginary part of the bracketed
expression inside the limit (10-5) yields the result
dv dv .
— cos a. -\ sin a.
ex cy
Hence, the limit (10-5) is equivalent to
/'(z) = e-'»
(du ou . \ (8w dv \
— - cos a + — sin a I + / 1 — cos a -\ sin a I
\ox cy J \dx By /
(10-9)
For f'(z) to be independent of the manner in which h-*0, it follows that
Eqn (10-9) must be independent of the value of a. In particular, the real and
imaginary parts of this expression must be independent of a. Expressing
f'(z) in real-imaginary form we obtain
/'(z) =
~Su
— cos 2 a +
Bx
+ i
Bv
By
tou ov
sin'' a + I 1
\By Bx
sin a cos a
'Bv Bu . (Bv 8u\ .
— cos 2 a — — sin 2 a + — sin a
Bx By \8y Bx)
cos a
(10-10)
SEC 1 0' 4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 465
Inspection shows that it can only be independent of a if both the following
conditions are satisfied:
8u 8v , 8u 8v
Tx'Ty 3nd Ty--JZ C ' 11 )
These are known as the Cauchy-Riemann equations and are fundamental
to the development of the theory of functions of a complex variable. An
immediate consequence of the Cauchy-Riemann equations is that Eqn
(10-10) may be written either as
_,, 8u 8v
f (z) = — + i — (case a = 0) (10-12)
or as
,,, dv 8u
J^ = Jy~ l Jy fc^ K = ^ (10-13)
It has thus been established that if a function /(z) is to have a uniquely
determined derivative at a point in the sense of Definition 10-6, then it must
satisfy the Cauchy-Riemann equations (10-11).
We now check whether the converse— the satisfaction of the Cauchy-
Riemann equations by a function automatically implying that the function
has a unique derivative— also holds. Let w = u + iv be a function such that
u, v satisfy Eqns (10-11). Consider first the function u at some point z =
x + iy. We know from Chapter 5 that at a neighbouring point z + h with
h — X + ijx, for Aw = u(x + X, y + ji) — u(x, y) we may substitute the
expression
8u 8u
where ei, r\\ -*■ as X, /j, -*■ provided that u x and u y are continuous. A
similar result is of course true for Ai>, the change in v consequent upon moving
from z to z + h, though for ei, r\\ we must substitute £2, r\i and require that
v x , v y are continuous.
Thus if A/=/(z + h) -f{z), we have
. . 8u , 8u (dv . 8v \
A/= Tx l + 8~y * + ' \Tx X + Ty * ) + ^ + ^ + ^ + W>"
Using the Cauchy-Riemann equations this can be re-expressed as
(8u 8v\ ,
f= \8x +i 8x'} h + (£1 + iE2)X + (m + ir,2)f *>
466 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
whence
A/ du dv
n dx dx
(fy + (Vi + iV2) (£)• (10-14)
However, | A | < | A | , | /* | < | A | so that
A
<1,
<i;
and as ei, £2, r?i, and 172 all tend to zero as A, /j, -> 0, by taking the limit of
Eqn (10-14) as | A | -*■ we arrive at
ft \ du _l • 8v
ox dx
The fact that/(z) is assumed to satisfy the Cauchy-Riemann equations
and to have continuous partial derivatives u x , u y , v x , and v y has thus enabled
us to prove that/(z) has a unique derivative. We have established the follow-
ing fundamental theorem.
theorem 10-6 (Cauchy-Riemann theorem) If u(x,y) and v(x,y) have
continuous first order partial derivatives in some region, then necessary and
sufficient conditions that/(z) — u + iv should have a derivative at each point
z = x + iy of that region are that
du dv du dv
dx dy dy dx
Results (10-12) and (10-13) may be used to deduce the form off'(z) by
using the simple observation that when z is purely real, so that z = x, the
forms assumed by/'(z) and/'(x) are identical. Similarly, when z is purely
imaginary, so that z = iy, the forms of/'(z) and/'Oj) are identical. This
gives the following straightforward rule for determining the derivative
f'(z) of the function /(z) which is sometimes helpful.
Rule 1 (Determination of the derivative of a complex function)
If/(z) = u + iv satisfies the Cauchy-Riemann equations, then the derivative
f'(z) expressed in terms of z may either be deduced
(a) from the result
W = -x + 1 Tx
by formally setting y = 0, and then replacing x by z; or
(b) from the result
SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 467
ft \ dV ■ tu
f(z) = — -l —
cly cly
by formally setting x = 0, and then replacing iy by z.
Example 10-14 Determine which of the following functions satisfy the
Cauchy-Riemann equations and thus possess uniquely defined derivatives.
Give the form of this derivative when it is defined.
(a) iv = z 2 ;
(b) w = cos z;
(C) IV = | Z |.
Solution
(a) If iv = z 2 , then iv = (x + iy) 2 = x 2 - y 2 + ilxy and so u = x 2 - y 2 ,
v = 2xy, So u x — 2x, u y = — 2y, v x = 2y, and v„ = 2x. It is readily seen
that these expressions satisfy the Cauchy-Riemann equations and so we may
conclude that iv = z 2 possesses a unique derivative. It follows from Eqn
(10-12) that
f\z) = 2x + i2y = 2z.
This result was so simple that appeal to Rule 1 was not necessary.
(b) If w = cos z, then w = cos (x + iy) = cos x cos iy — sin x sin iy,
when w = cos x cosh y — i sin x sinh j, and so u == cos x cosh 7,
r = — sin x sinh j. Hence, u x — — sin x cosh 7, Hj, = cos x cosh y,
v x = — cos x sinh y and v y — — sin x cosh ;-. Here also it is immediately
apparent that the expressions satisfy the Cauchy-Riemann equations,
showing that w = cos z possesses a unique derivative.
Let us choose to work with Rule 1 (a) to determine f'(z) in terms of z.
We must therefore start with the equation
ft \ Bu , ' 8v
ex ex
In this case We find
f\z) =b —sin x cosh y — i cos x sinh y.
Then, setting y = and replacing x by z gives
/'(z) = - sin z.
It is instructive to compare this rapid method with the direct approach
we now indicate.
f\z) = — sin x cosh y — i cos x sinh y
= — sin x cos iy — cos x sin iy
= — sin (x + iy) = — sin z.
468 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
(c) If w=\z |, then w = (x 2 + y 2 ) 1 ' 2 , showing that u = (.v 2 +J 2 ) 1 / 2 ,
v = 0. Then, as u x = xj{x 2 + y 2 ) 1 ' 2 , u y = yj(x 2 + y 2 ) 1 ' 2 , v x = v y = 0, it is
clear that w = | z | cannot satisfy the Cauchy-Riemann equations anywhere
in the complex plane. We conclude that w = | z | has no derivative at any
point in the complex plane.
Example 10- 15 Determine the constants a and b in order that
»v = x 2 + ay 2 — 2xy + i(bx 2 — y 2 + 2xy)
should satisfy the Cauchy-Riemann equations. Deduce the derivative of ir.
Solution Here we have u — x 2 + ay 2 — 2xy, v = bx 2 — y 2 + 2.yv so that
u x = 2x — 2y, u y = 2ay — 2x, v x = 2bx + 2y, and v y = —2y + 2x. It is
certainly true that u x = v y , so that the first of the Cauchy-Riemann equations
is automatically satisfied. For the second equation to be satisfied we must
require that u y = — v x , or 2ay — 2x = —{2bx + 2y). This is only possible
ifa= -l,b= 1.
Now as/'(z) = u x + iv x , we have
f'{z) = 2x - 2y + i(2x + 2y).
Again, working with Rule 1 (a) gives
f\z) = 2(1 + i)z.
Had we chosen to work with Rule 1 (b) to express/'(z) in terms of z we should
have started from the equation
f'{z) = Vy — iUy
which in this case becomes
f\z) = -2y + 2x + i(2y + 2x).
Then, setting x = and this time replacing iy by z, we again arrive at
f\z) = 2(1 + i)z.
As the complex number z can also be expressed in modulus argument form
by writing z = re'", it is necessary to know the form taken by the Cauchy-
Riemann equations in terms of the variables (r, 6). This is most readily
achieved by appeal to Theorem 5-22.
It follows directly from Theorem 5-22 that :
8u _ 8r du 86 du 8u _8r 8u 86 cu
8x~~dx~dr ~d~x 8~6 8y ~ ' Ty Tr + 8~y 86
} (10-15)
8v_8r8v 86 8v 8v _ or 8v 86 8v
8x dx 8r 8x 80 8y cy or 8y 86
SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 469
In these equations (r, 6) are the polar coordinates of the point (x, y) and so
x = r cos 8, y = r sin 8.
(See Eqns (4-13).)
These relationships may now be used to determine drjdx, BrjBy, B8\Bx,
86 jdy as follows :
8r 1
cos 6
ana so
8x cos 6
cos 6 = -
r
whence
■ ,88
— sin 8 — =
8x
1 , 38
and so —
r ox
1
r sin 6'
sin 6
and so
8r 1
8y sin 6'
sin 6 = y -
r
whence
88 1
cos 8 — — -
8y r
and so — =
3y
1
rCOS 8
8u 1 8v
and
1 8u
8v
dr r 86
r 88
8r
Combination of these results with Eqns (10- 15), followed by some simple
manipulation, then establishes that the polar form of the Cauchy-Riemann
equations is
i 1 /)ii Pin
(10-16)
Functions f{z) that are uniquely defined in some neighbourhood of a
point zo and satisfy the Cauchy-Riemann equations at zo and throughout
that neighbourhood are called either analytic or regular functions. Points at
which a function ceases to be analytic are called singularities of the function.
Thus the function /(z) = l/(z + 1) is easily seen to be analytic everywhere
except at the point z = — 1 , which is a singularity.
Supposing that u X y, v xy exist and are continuous, it follows directly by
partial differentiation of the Cauchy-Riemann equations u x = v y , u y = —v x
that
8 2 u B 2 u 8 2 v 8 2 v
Bx 2 By 2 Bx 2 By 2
These equations are identical in form and are examples of an important
partial differential equation called Laplace's equation, any solution of which
is called a harmonic function. The harmonic functions u and v associated
with an analytic function /(z) = u + iv are called conjugate harmonic
functions. For example, we have seen that
cos z = cos x cosh y — i sin x sinh y
is an analytic function with u = cos x cosh y, v = — sin x sinh y. Now both
u and v are such that u xy , v xy are continuous, so it follows immediately that
470 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
u and v satisfy Eqns (10-17). Hence u = cos x cosh y, v = — sin x sinh y are
conjugate harmonic functions. The term conjugate is, of course, used here
in a different sense from when discussing complex conjugates.
If u, v are harmonic functions and we consider the analytic function
w = u + iv, then an obvious modification of the arguments that gave rise
to Rule 1 leads to the following rule for the expression of w in terms of z.
Rule 2 (Expression of an analytic function in terms of z)
If u, v are conjugate harmonic functions, then the analytic function w =
u + iv expressed in terms of z may be deduced either by :
(a) formally setting y = in the expression w — u + iv and then re-
placing x by z; or
(b) formally setting x = in the expression w = u + iv and then re-
placing iy by z.
Example 10-16 Show that u = 2xy + 3y is harmonic and determine its
harmonic conjugate v. Express the functions dn'/dz and w = u + iv in terms
of z.
Solution We have u x = 2y, u xx = 0, u y = 2x + 3, u yy = 0, showing that
Uxx + Uyy = 0. Hence u is harmonic. If v is to be the harmonic conjugate of
u then the functions u, v must satisfy the Cauchy-Riemann equations
u x = v y , u y = —v x .
Using the known expressions for u x , u y we find that
(a) 2y = Vy, and (b) 2x + 3 = — v x .
Integration then gives :
from (a),
v — y 2 + f(x) + const,
from (b),
v = — x 2 — 3x + g(y) + const,
where as yet/(x) is an arbitrary function of x and g(y) is an arbitrary function
of j. However, as these are two alternative expressions for the same function
v they must be identical, whence /(x) = — (x 2 + 3x) and g(y) = y 2 . Thus
we have arrived at the expression
v = y 2 — x 2 — 3x + const
for the function v, which is the harmonic conjugate of u.
Applying Rule 1 (a) to find/'(z) requires that we start from
8u ,dv
f (z) = 8- X + 'Tx
SEC 10-5 CONFORMAL MAPPING / 471
or, in this case, from
/'(z) = 2y - i(2* + 3).
So, setting y = and replacing x by z, gives
f'(z) = -i(2z + 3).
To express w = u + iv in terms of z we must work with Rule 2. We have
H' = (2xy + 3y) + i(y 2 — x 2 — 3x) + const,
so that if we apply Rule 2 (a), we must set y = and replace x by z to arrive
at
w = —i(z 2 + 3z) + const.
It is important to notice when using Rule 2 that the functions u and v
must be conjugate harmonic functions, since otherwise they will not satisfy
the Cauchy-Riemann equations and the rule will be inapplicable. Indeed, if
the rule is applied to harmonic functions that are not conjugate, then the
functions of z that are generated by Rules 2 (a) and 2 (b) may, or may not
be identical. In neither case will the result be correct. For example,
u = sin x cosh y and v = cos x cosh y
are harmonic functions but they are not harmonic conjugates. Applying
Rule 2 (a) to w = u + iv generates the function w = sin z + / cos z, whereas
applying Rule 2 (b) generates the function w = i cos z. For a different
example, take u = x 2 — y 2 and v = xy, which are also harmonic functions
that are not conjugate. In this case both Rules 2 (a) and 2 (b) generate the
same function w = z 2 , though of course this also is incorrect.
10-5 Conformal mapping
Thus far we have examined some of the analytical consequences of requiring
that a function w =f(z) be differentiable. Let us now pursue this matter
further by studying some of the geometrical implications of differentiability.
Take two complex planes, which we shall refer to as the z-plane and the
w-plane, the connection between their respective points being through the
differentiable function w — f(z). Because each value of z gives rise to a unique
value of w, it follows that any curve y in the z-plane must correspond to
some other curve T in the iv-plane. In this sense the iv-plane can correctly be
described as a mapping of the z-plane.
For a specific illustration, let us determine how the straight line y = olx
in the z-plane is mapped by the function w = iz + (1 + i) onto the w-plane.
We begin by setting w = u + iv, z = x + iy, after which a simple calculation
yields u=l— y, v = x+l. Hence to find the line in the vv-plane that
corresponds to y = xx in the z-plane it is now only necessary to set y = olx
in these expressions for u, v and then to eliminate x between them. Performing
472 / FUNCTIONS OF A COMPLEX VARIABLE
CH 10
w-plane
Fig. 10-4 Mapping by the function w = iz + (1 + /).
these operations we find u = 1 — ax, v = x + 1, whence
\ a / <x
This is again an equation of a straight line but this time in the iv-plane,
The line passes through the point (0, (1 + a)/«) and has the gradient — 1/a.
Representative lines y\, yi are shown in the z-plane of Fig. 10-4 and their
respective maps or images are shown as the lines Y\, Yi in the associated
w-plane. The lines yi, yi correspond, respectively, to a = 1, a = 2.
It is not difficult to see that the map in the n-plane has been obtained from
the map in the z-plane by first rotating the original pair of lines anti-clockwise
through an angle \n and then translating the resulting picture to the point
1 + / as a new origin. More important than this, however, is the fact that
the angle 6 between the lines yi, yi is equal to the angle between the lines
Ti, Yi and, moreover, the sense of rotation is preserved. That is to say if yi
is inclined to y\ at an angle 6, measured anti-clockwise, then T 2 is also inclined
to Ti an an angle 6, measured anti-clockwise.
This is no chance result and, indeed, we now prove that if a function
f(z) is analytic (that is, satisfies the Cauchy-Riemann equations and so has a
uniquely defined derivative) then, except for points z at which /'(zo) = 0,
the function w = /(z) will preserve both the angle and the sense of rotation
when mapping intersecting curves yi, y% in the z-plane onto corresponding
intersecting curves Ti, Yi in the n-plane. These properties of a mapping or
transformation are recognized by saying that the transformation is conformed.
To prove this general result we now consider a function w = /(z) that is
analytic in some region of the z-plane and take a point z in that region at
which/'(zo) ¥= 0. Let yi, yi be two curves drawn in the z-plane that intersect
SEC 10-5
CONFORMAL MAPPING / 473
t
a
z-plane
■m
(a)
Fig. 10-5 Conformal mapping w = /(z).
at zo and let z\ denote a point Q on the curve y\ as indicated in Fig. 10.5. We
shall suppose that as Q moves away from P along y\ in the direction indicated
by an arrow in the Figure, so the point h'i =/(zi), which we denote by Q',
moves away from point P' in the direction indicated. This process thus
associates a sense of direction with each of the corresponding curves y\ and
Y\. A similar argument defines directions along y% and ITV
Now as Q approaches P, so the secant PQ will assume its limiting position
in which, when it is inclined at an angle ai to the x-axis, it is tangent to y\
at zo. As PQ = zi — zo we have
ai = lim arg (zi — z ).
zi— zo
Identical reasoning shows that
/Si = lim arg (in — w ),
where /Si is the angle of the tangent to Ti at P' measured from the w-axis.
Hence we have
/Si — ai = lim arg (in — iv ) — lim arg (zi — z )
and, as arg a — arg b = arg ajb, this may be written
„ .. /H'i - M'o\
/Si — ai = lim arg •
zi^zo \ Zl — Zo /
However, as we are assuming/(z) is differentiable
474 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
/Wl — R'o\
/'(zo) = lim .
zi-«*o \ z l — z /
and provided /'(zo) ¥= it then follows that
0i - ai = arg/'(zo). (10-18)
In the case that/'(zo) = 0, the amplitude of/'(zo) is indeterminate. Such
points are called critical points of/(z), by analogy with the real variable case.
We have seen that/'(zo) is unique, so that the expression on the right-
hand side of Eqn (10-18) is a constant. The result must, then, also be true for
any other curve yz, say, and its map r 2 . Hence we have
/?1 — «1 = 02 — <*2
or
a 2 — ai = 02 — 01.
The curves y\, y% were any two curves which intersected at zo, so we have
proved the following result.
theorem 10-7 (conformal mapping) If f(z) is analytic in some region,
then apart from those points zo in that region for which f'(z ) = 0, the
mapping w = f(z) preserves both the angle and the sense of rotation when
mapping intersecting directed pairs of curves in the z-plane into corresponding
intersecting directed pairs of curves in the w-plane. Such a mapping is said
to be conformal.
To close this chapter we now examine some important special conformal
mappings. Rather than emphasize the algebraic details of the transformations
or mappings, we shall aim primarily at interpretation in terms of basic
geometrical operations such as translation, rotation, and change of scale
(dilatation).
10-5 (a) The general linear transformation
The general linear transformation is the name given to the mapping described
by the equation
w = az + b, (10-19)
where a, b are arbitrary constants with a ^ 0. Our introductory example was
of this form with a = i, b = 1 + /. The mapping (10-19) obviously satisfies
the Cauchy-Riemann equations and, as dw/dz = a =£ 0, it has no critical
points and so provides a conformal mapping of the entire z-plane. To
appreciate the geometrical effect of this mapping consider first the case in
which a = 1 so that w = z + b.
This has the effect of generating the w-plane by simply adding a constant
complex number b to every point in the z-plane. Using the vectorial repre-
SEC 105 CONFORMAL MAPPING / 475
sentation of complex numbers this is seen to be equivalent to generating the
H-plane by shifting the entire z-plane through a distance | b | parallel to the
vector b. Such a mapping is accordingly called a translation. Another way of
expressing this result is by saying that if the w- and z-planes were to be
superimposed, then the 0{u, v} axes would be obtained by translating the
0{x,y} axes, without rotation, such that in their new position the origin
coincided with the point z = — b. To see this, remember that b is a vector
and that the position vector of the origin of 0{w, v} is b relative to 0{jc, y},
but that the position vector of the origin of 0{x, y] relative to 0{w, v} is — b.
Consequently, we may conclude that the mapping w = z + b leaves invariant
the shape and size of any curve in the z-plane.
Next we consider the consequences of setting b = so that w = az. If
we write a = pe ia and z = re 6 , we have w = pre i(,x+e) . This shows that the
effect on the z-plane of the mapping w = az is to multiply the modulus of z by
a constant factor p and to increase the argument of z by a constant angle a.
Hence w = az corresponds to a magnification, or dilatation, of every z by a
constant factor \a\, and a rotation about the origin of every z by a constant
angle a. Thus we may deduce that the general linear transformation
w = az + b
of the z-plane may be described geometrically as the combination of a
dilatation, a rotation, and a translation. In the trivial case a = 1, b = the
mapping reduces to an identity.
10-5 (b) The mapping w = z n
A typical example of this form is provided by the function w = z 2 . As it is
interesting to interpret mappings in terms of both polar coordinates and
cartesian coordinates, let us first study the polar representation. To do this
we set z = re? e , w = pe? 4 , when we find
p(cos (/> + i sin <£) = r 2 (cos 26 + i sin 26),
showing that p = r 2 and <j> = 26 + 2mr, where n = 0, 1, 2, . . .. However,
for our purposes we shall disregard this ambiguity of the angle <f> with respect
to multiples of 277, since all angles in polar coordinates are indeterminate in
this manner.
In words, the effect of the mapping w = z 2 is to square the modulus of
every number z and to double its argument. This is very easily illustrated by
appeal to Fig. 10-6 depicting the mapping of a shaded portion of an annular
region in the z-plane into another, larger, annular region in the w-plane. The
conformal nature of the mapping is reflected by the fact that at the corres-
ponding corners of the figures the angles between the boundary lines together
with their senses have been preserved. They are of course equal to \n in this
instance.
Because of the properties just outlined it is readily seen that the function
476 / FUNCTIONS OF A COMPLEX VARIABLE
CH 10
(a)
w-plane
(b)
Fig. 10.6 The polar mapping w = z 2 .
w = z 2 maps the upper half z-plane onto the entire w-plane. When this is
done it is necessary to exclude the origin in the w-plane together with all the
points on the positive w-axis, since these are mapped twice. In fact they
correspond to points on both the positive and negative parts of the real axis
in the z-plane. The origin in the w-plane is in fact a critical point, for w' = 2z
vanishes at z = 0. This exclusion of a line of points in the w-plane is often
described by saying that the w-plane has been cut along the real axis.
The effect of the mapping is more striking if it is displayed in terms of
x and y by again setting w = u + iv, but this time writing z = x + iy to
obtain u = x 2 — y 2 , v = 2xy. These equations show, for example, that the
straight line x = a maps into the curve u = a 2 — y 2 , v = 2ay in the w-plane
which, after elimination of y, is seen to be equivalent to v 2 = 4<x 2 (a 2 — u).
Similarly, the straight line y = /S may be seen to map into the curve v 2 —
SEC 10-5
CONFORMAL MAPPING / 477
Fig. 10-7 The Cartesian mapping w = z 2 .
(a)
-6
x = 0-5-
x=\
-4
y = l
r
^^^ \
-6
x= 1-5
x = 2
-j>=1
-^ = 0-5
vf\
(b)
478 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
4/5 2 (/? 2 + u) in the w-plane. These equations describe two parabolas that are
symmetrical about the w-axis, as shown in Fig. 10-7.
The lines x = l,y = 3/2 denoted by y\ and 72, respectively, in the z-plane
map into the parabolas Y\ and T 2 in the w-plane. This shows that the single
point z = 1 + 3z'/2 denoted by P in the z-plane (that is, the point (1, 3/2))
maps into the pair of points P' and P" in the w-plane determined by the two
points of intersection of parabolas Y\ and T 2 . Again the conformal nature
of the transformation is reflected in the easily checked geometrical fact that
the two families of parabolas are mutually orthogonal, as are the lines
x = const, y — const in the z-plane.
The more general mapping w = z n may be analysed in similar fashion,
though the algebraic complexity is naturally greater. When n is integral the
mapping may be seen to transform the segment < arg z < 2-rr\n into the
complete w-plane with a suitable cut along the w-axis. (Care must be exercised
when n is fractional for then the mapping is many valued. We shall not
pursue this matter further.)
10-5 (c) The inversion w = 1/z
For obvious reasons the mapping w = 1/z is called the inversion mapping.
Its geometrical effect may be deduced by setting w = />e**, z = re ld to find
p(cos <f> + i sin <f>) = - (cos 6 — i sin 6).
Arguing as with the function w = z 2 , we then see that this implies that
P =llr,4>=-6.
Expressed in words, the inversion mapping w = 1/z transforms a point
in the z-plane with modulus r and argument 6 into a point in the w-plane
with modulus \jr and argument —d. This may be interpreted geometrically
by appeal to Fig. 10-8 in which the w- and z-planes are shown superimposed
with a common origin, and P is any point in the z-plane with P' denoting its
image in the w-plane.
The circle shown in Fig. 10-8 is the unit circle \z\ = 1, and point Q on
the radius vector drawn from O to P is such that OP . OQ = 1. Hence if
OP = r, then OQ = \jr. In geometrical terms point Q is said to have been
obtained by inverting point P with respect to the unit circle. Point P', which
is the image in the w-plane of the point P in the z-plane, is then obtained by
reflecting Q in the x-axis.
Thus the mapping w = 1/z corresponds to the inversion of points z with
respect to the unit circle, followed by their reflection in the real axis. The
inversion mapping thus maps the points interior to the unit circle about the
origin of the z-plane onto the exterior of the unit circle about the origin of
the w-plane, and vice-versa. The two unit circles map onto one another.
Algebraically, we write w = u + iv, z = x + iy, when
SEC 10-5
CONFORMAL MAPPING / 479
Fig. 10-8 Inversion in unit circle followed by reflection in the x-axis.
+ /'
V —
-y
x 2 + _y 2
To learn how the line x = a in the z-plane maps onto the w-plane we need
only set x — a in the expressions for u and v and then eliminate y to obtain
the equation
M a + v z _ _ =
a
Similarly, the line y = {} in the z-plane maps onto the curve in the w-plane
defined by the equation
W 2 + v * + = 0.
P
When these equations are rewritten in the form
(-=)* + -(=)'
and
M 2 +
(" + ^) ! -(^)'
it is easily seen that the line x = a in the z-plane has for its image in the w-
plane a circle of radius \a. with its centre at (|oc, 0), whilst the line y = ft in
the z-plane has for its image in the w-plane a circle of radius \$ with its
centre at (0, — \p). We may conclude that lines parallel to the x- and j-axes
map onto circles in the w-plane which pass through the origin and have their
480 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
centres on the u- and u-axes.
Had the general straight line y — mx + c in the z-plane been mapped,
then this same form of argument would have shown that any such line not
passing through the origin will transform into a circle through the origin in
the w-plane. Lines through the origin in the z-plane transform into lines
through the origin in the tv-plane. The verification of these remarks is left
as an exercise for the reader.
10-5 (d) The bilinear transformation
Any mapping of the general form
az + b
w = — — , (10-21)
cz + d '
is called a bilinear transformation or a linear fractional transformation. The
general linear transformation and the inversion mapping are special cases of
the bilinear transformation. We now show that bilinear transformations are
characterized by the property that they map circles and straight lines in the
z-plane onto circles and straight lines in the w-plane, though not necessarily
in this order.
Let us now write the transformation (10-21) in the form
a ad — be
w = ~c ~ c* { z + mi (10 ' 22)
We assume c ^ and ad — be ^ 0; this is justified since if c = the trans-
formation reduces to the general linear transformation, whereas if ad — be
= 0, then w reduces to a constant. So, if we define new variables z\ and z 2 by
d 1
zi = z + -, z 2 = -. (10-23)
C Zi
then (10-22) becomes
a lad — bc\
w= c-(^H z2 - (10 ' 24)
We must now consider the sequential effect of the mappings that trans-
form from the z-plane to the w-plane via the intermediate planes z\ and zz.
The mapping from the z-plane to the zi-plane is a pure translation and thus
leaves the shape and size of all curves invariant. The mapping from the
zi-plane to the Z2-plane is an inversion and, as we have just seen, maps
straight lines not passing through the origin onto circles, and straight lines
through the origin onto straight lines. Finally, the mapping from the Z2-plane
to the w-plane is a general linear transformation and so comprises a rotation
and a translation. Hence, in particular, this final mapping will transform
straight lines into straight lines and circles into circles. This justifies our
earlier statement that the bilinear transformation maps straight lines and
circles into straight lines and circles, though not necessarily in this order.
SEC 10-5 CONFORMAL MAPPING / 481
Example 10-17 Find the image in the vc-plane of the circle \z\ = 2 if
z — i
z + i
u —
Solution Setting w = u + iv, z = x + iy we find that
x 2 + y 2 - 1 -2x
X 2 +y 2 +2 y +\ V ~ X 2 + yl + 2 y + 1
Now the circle | z | = 2 has the equation x 2 + J 2 = 4, which used in the
expressions for u, v gives
3 -2x
U = , V =
2y + 5 2>< + 5
Next, solving these for x and _y, we find
-3v 1 /3
x = — — » v = 5
2« 7 2 \ M
so that on the required circle x 2 + y 2 = 4 this pair of equations is equivalent
to
3(m 2 + v 2 ) - 10m + 3 = 0.
When this equation is expressed in the form
it can be recognized as the equation of a circle in the iv-plane having a
radius of 4/3 and its centre at the point (5/3, 0).
This conclusion could have been obtained more easily by using the
following argument. The equation
z — I
w =
z + i
is equivalent to
5 Y , 16
./l + w\
Hence, as zz = .y 2 + j 2 , we have
*2 + j2 = /( _ /!±^\/i±^n = i + w + w + z g
\ 1 — Wf \ 1 — WJ 1 — W — W + WW
In terms of w — u + iv, w = u — iv this becomes
1 + 2h + M 2 + V 2
x l + y l =
7 1 - 2m + « 2 + ^2
482 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
and, on the circle x 2 + y 2 = 4, it reduces our previous result
3(w 2 + v 2 ) - 10m + 3 = 0.
10-6 Applications of conformal mapping
In any first account of the theory of conformal mapping, it is impossible to
do more than merely indicate its application in science and engineering.
From the fields of elasticity, electromagnetic theory, fluid mechanics, and
heat conduction in which these ideas play important roles, we choose just one
simple example. Our choice, from fluid mechanics, is solving the problem of
the two-dimensional flow of an incompressible fluid around the interior of a
wedge shaped region, on the assumption that the flow has a special property
which enables it to be classified as being irrotational. These are in fact con-
ditions which are usually valid in most low speed flows of ordinary fluids.
In books on fluid mechanics it is established that if q\ and qz are the x
and y components of velocity at a point in an incompressible inviscid fluid
that is undergoing two-dimensional flow, then under the stated conditions
these components may be written in the form
qi = ^ V = ^ (10-25)
8x By
where <f>(x, y) is a function called the velocity potential of the flow. The lines
fy(x, y) = constant are called equipotentiah. Using the vector interpretation
of complex numbers we may thus represent the fluid velocity q by the complex
variable
*-¥ + & d°- 26 )
8x 8y
It can also be established that if fluid is neither created nor lost within the
flow region, then <f>(x, y) must be such that
^ + »+ - 0. (10-27)
8x 2 8y %
Thus <f> satisfies Laplace's equation and so is harmonic. Introducing the
harmonic conjugate of <f>, which we shall denote by ip(x, y), enables us to
define a further complex variable F{z) by the equation
F(z) = <Kx, y) + iy(x, y). (10-28)
This is called the complex potential and xp{x, y) itself is called the stream
function of the flow. Now by the nature of the construction of F(z), it is
differentiable in the sense of Definition 10-6 and so satisfies the Cauchy-
Riemann equations. Hence
<f>x = Vy, <l>v = —Wx
SEC 10-6
APPLICATIONS OF CONFORMAL MAPPING / 483
or, in terms of q\ and q 2 ,
qi=<j>x = y y , q^ = 4> y = -f x . (10-29)
These relationships provide the justification for the name stream function,
for they show that the velocity vector is everywhere normal to the curves
</"(*> y) = const. This follows because on cf>(x, y) = const, <}>^dx + <f>ydy =
showing that
dy _ — 4>x
dX (f>y
Hence if n is the gradient of the normal to a curve <f>(x, y) = const, then
n(dyldx) = -1, whence n = j> v \<$> x . However, from results (10-29) this is
equivalent to n = q 2 lqi, which is the slope of the curve traced by a fluid
particle. Hence the curves f(x, y) = const are curves along which fluid flows
and so can properly be called streamlines.
Consider the complex potential
F(z) = Uoz, (10-30)
where Co is a positive real number. Then we have at once
<£ = U x, ip = U y. (10-31)
The streamlines y = a are thus the lines y = a/t/ , and the velocity q is
ox By
Thus the complex potential F(z) = U z must characterize a uniform flow,
with velocity U parallel to the x-axis and directed in the sense of increasing
x. This is illustrated in Fig. 10-9 (a).
II
-plane
iv-plane
*►
(a) (b)
Fig. 10-9 Transformation of fluid Row: (a) uniform flow in upper half plane;
(b) flow inside wedge.
484 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
Now if we consider the transformation
w = z 1 ' 3 , (10-32)
then we know from the arguments used in connection with the mapping
w = z" that it will map the upper half of the z-plane onto the wedge < arg
w < 577 in the w-plane.
Then, as (10-32) is equivalent to z = w 3 , we must have
x + iy = (u 3 — 3uv 2 ) + i(3u 2 v — v 3 )
giving
x = u 3 - 3uv 2 , y = 3u 2 v - v 3 . (10-33)
Hence the velocity potential is
<£ = u (u 3 - 3uv 2 ) (10-34)
and the stream function
f = U (3u 2 v - v 3 ). (10-35)
Thus the curves f = const define the streamlines inside the wedge shaped
region, and some representative streamlines are shown in Fig. 10-9 (b). To
determine the speed at any point within the wedge we use the fact that
dF 8(j) 8w
— = — + /—= ?i - iqi,
dz ox ox
showing that the speed | q \ is given by
, , dF
\q\ =
dz
(10-36)
As the complex potential is
F(z) = U w 3 , (10-37)
we have
^=3U w 2
dz
and, finally,
| q | = 3t/ | w 2 | = | (u + iv) 2 | = u 2 + v 2 . (10-38)
Thus at a point P with coordinates (mo, vo) within the wedge, the speed
| q | = mo 2 + tfo 2 . The streamline through the point P is provided by Eqn
(10-35), for the constant associated with this streamline through P must be
3«o 2 f o — ^o 3 , so that the streamline itself has the equation
3u 2 v — v 3 = 3uo 2 vo — vo 3 .
PROBLEMS / 485
As mentioned at the beginning of this section, conformal mapping has
many other applications, all related to solutions of Laplace's equation in
two dimensions. The application described here can provide no more than
an indication of one of these situations.
PROBLEMS
Section 101
10-1 Test the following sequences {z n } for convergence and, where appropriate,
find the limit y stating whether or not it is a member of the sequence.
(a) z„ = 2» + /'3-«;
3 1
(b) z n = n tan - + in sin - ;
n n
< c > *» = „-(_i). + 4 ';
(e)z n = isin^+icos^.
10-2 Give examples of:
(a) a non-convergent sequence {z„};
(b) a convergent sequence {z„} with limit 2 + 3i.
10-3 Given that the sequences {w n }, {z„} are defined by
-=( 1+ -I) + '-(5^t) - *— -l + 'fci^)-
find the limits of the sequences {w n + z„}, {w„z n } and {w„/z„}.
10-4 Identify the limit points of the sequence {z„} where
10-5 The general term of the sequence {z„} is
_ / 2«2 + 1 \ . / «>W \
*•" (3^ + 2 w + 3J + ' COS (^nj-
Find values of a for which {z„} has :
(a) one limit point,
(b) two limit points,
and state their location. Are the values of a unique?
10-6 Construct examples of a sequence {z„} which has :
(a) two limit points;
(b) three limit points;
(c) no limit points.
486 / FUNCTIONS OF A COMPLEX VARIABLE
CH 10
Section 10-2
10-7 Sketch each of the following curves defined in the complex plane:
(a) x = s, y = \/(l - s 2 ) for -1 < s < 1 ;
(b) x = a sin s, y = b cos s for < s < 2-n (a, b real) ;
(c) x = cosh s, y = sinh j for — co < j < co ;
(d) |z+2-/| =3;
(e) zz = 4.
Sketch the region defined by each of the following sets of inequalities and
indicate when the boundary points belong to the region so defined.
10-8 Im(z + iz) > and Re z > 0.
10-9 2 < | z | < 3 with < arg z < \n.
10-10 1 < | z - 1 | < 2 and 1 < | z + 1 | < 2.
10-11 Sketch the region that lies inside the curve defined by
arg (z + 2) - arg (z + 3) = in
and is such that Im z > J.
Give an alternative representation of this region.
1012 Draw the curve C defined by
arg (z - - arg (z - 1) = \t*.
Problem 10-13
10-13 Define the figure-eight-shaped curve shown in the diagram in terms of argu-
ments of complex numbers. The curves Ci and Cz are arcs of circles with
centres Oi and O2, respectively.
10-14 Sketch a simply shaped region in the complex plane and define it:
(a) parametrically;
(b) directly in terms of z.
PROBLEMS / 487
Section 103
10-15 For what values of z are the following complex functions defined :
(a) w = z 2 + iz + 1 ; (b) w = (z - l)/(z - 2);
(c) (z + l)(z - i)(z 2 + 4); (d) h> = sinh z.
10- 16 If/(z) = u + iv, find the expressions for the functions u, v in terms of x, y
given that:
(a) /(z) = z 2 + zz + 1 ; (b) /(z) = £±i;
(c) /(z) = cosh z; (d) /(z) = cos z.
1017 Given the following forms of /(z) deduce their value if z = 1 + 2i:
(a) /(z) = x 2 + 3xj + iy 2 ;
x 2 + 2;> + 1.
x 2 + y
(c) /(z) = sin y (x 2 - ;> 2 ) + / cos ^f (x 2 + if),
(")/(*)= r24 ., ;2
1018 Use Definition 10-4 to prove
lim (2z 2 - 1) = -(1 + 4/).
z— 1 -i
1019 Use Definition 10-4 to prove
Hm (£=!)- -6.
z— 8/2 \2z + 3 J
10-20 Use Definition 10-4 to
prove
(2 - /z)(z 2 - 1)
hm — r; — u — = 2( - 2 - ')•
z— 1 (.Z — 1)
10-21 Given that/(z) = z 2 + z - 2,g(z) = z + 2 deduce:
(a) lim[/(z) + 2 (? (z)];
z— 2
(b) limf(z)g(z);
z->-i
(0 Hm M
z-i-2^0)
10-22 Prove that
lim fc\ =
1*1-0 \ z J
by considering
lim
1*1
o \ zz /
writing z = x + iy, and then arriving at the result by displaying the function
whose limit is to be considered in terms of its real and imaginary parts.
488 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
Deduce that
lim
|s|-*o
(sin az\
— )-"■
where a may be a complex number.
10-23 Use the result
lim l^) = 1
UI-o\ z /
established in Problem 10-22 above together with the identity
— cos 2 z
(sin z\ 1
(l + :
to prove that
lim (Lz™l\ = o.
10-24 For what value of a is the function
3z for z t^ i
for z = i,
continuous at z = /.
10-25 Give an example of a function /(z) that :
(a) is continuous everywhere ;
(b) has a limit 3 + 2i as z — >- 1 + i, but is not continuous at z = 1 + /'.
10-26 Use Definition 10-5 to give a direct proof that /(z) = z 2 is continuous
everywhere.
10-27 Use the trigonometric identity
Iz + zo\ . /z - zo\
sin z — sin zo = 2 cos I — - — I . sin I — - — I
and the last result of Problem 10-22 above to give a direct proof that/(z) =
sin z is continuous for all z.
10-28 Give reasons to justify the assertion that
/(z) = z sin (z 2 + 3z + 2) + l/(z + 2 - /)
is continuous everywhere except at z = — 2 + i.
Section 10-4
10-29 Use Definition 10-6 to prove that if w = az 2 , where a is any constant, then
dtv
— = 2az
dz
for all z.
10-30 Use Definition 10-6 to prove that if /(z) is a differentiable function of z in
some region, then in that region
& WW -& + >%■
PROBLEMS / 489
10-31 By using the series representation of the hyperbolic sine function prove that
/sinh z\
Po(— J"
Then, using the identity
sinh zi — sinh Z2 = 2 sinh [(zi — z 2 )/2] cosh [{zi + z2>/2],
which may be derived directly from identity (6-29), show by means of
Definition 10-6 that if w = sinh z, then
dw
-r- = cosh z
dz
for all z.
10-32 Show by means of Definition 10-6 that the function /(z) = | z | is not differ-
entiable at the origin. Find the limiting value assumed by the difference
quotient at the origin (that is, with zo = 0) as h -»■ along the line y = he.
10-33 Determine which of the following functions /(z) satisfy the Cauchy-Riemann
equations :
(a) /-(z) = z 3 -;z 2 + 3;
(b)/(z) = cosh(z + 3/);
(c) /(z) = z sin z + zz;
(d) /(z) = (*3 - 3xy 2 ) + iQx 2 v - y 3 );
(e) /(z) = z(r + z)/2;
(f ) /(z) = sinh 3x cos j + i cosh 3x sin j\
10-34 Find the points, if any, at which the following functions are not analytic:
(a) /(z) = 3z + sinhz;
(b) /(z) = z\(z + 2);
(c) f(z) = cos 1/z;
(d)/(z) = |^.
10-35 Find the values of the constants a and b in order that the functions w should
satisfy the Cauchy-Riemann equations :
(a) w = a sin x cosh Z>/ + /2 cos jr sinh/;
(b) w = x 3 - oxy 2 - x + 1 + /'(3^ 2 - by 3 - 1).
10-36 Using the method outlined in the text, show that if x = r cos 8, y = r sin 9,
then the polar form of the Cauchy-Riemann equations is :
— = l— a \ d JL- 8v
dr ~ r' dd 7' dd Jr
10-37 Determine which of the following functions /(z) satisfy the Cauchy-Riemann
equations :
(a) w = (r 2 cos 2 6 + 2) + ;> 2 sin 2 0;
(b) w - (r 3 cos 36 + 2r cos 6 + 4) + i(r 3 sin 36 + r sin «);
(c) w = |r + -| cos d + i (r - -\ sin 9;
(d) w = r 2 cos 2 9 + /V 2 sin 2 6 + 4;
(e) w = sin (r cos 9) . cosh (r sin 9) + / cos (r cos 9) . sinh (r sin 9).
490 / FUNCTIONS OF A COMPLEX VARIABLE CH 10
10-38 Find the values of the constants a, b, and c in order that the following
functions should satisfy the Cauchy-Riemann equations:
(a) w = a log r + i(6 + br);
(b) w = r a cos £0 + ibr c sin \ d.
10-39 Verify that the following functions w satisfy the Cauchy-Riemann equations
and in each case express the derivative of w as a function of z:
(a) w = (x z — 3xy 2 + y) + i(3x 2 y — y 3 — x);
(b) w = (x sinh x cos y — y cosh x sin y) + i(y sinh x cos y
+ * cosh x sin _y) ;
(c) w = e ax (cos ay + i sin ay).
10-40 Find which of the following pairs of functions are harmonic conjugates.
Deduce the representation of w = u + iv in terms of z for the pairs that are
harmonic conjugates, first by using Rule 2 (a), and then by using Rule 2 (b) :
(a) u = x 2 — y 2 + 1y, v = 2x(j — 1);
(b) u = sin x cosh j, v = cos x sinh y;
(c) u = x sin x cosh y — y cos x sinh j,
y = — (x cos x sinh y + y sin x cosh j) ;
(d) u = sinh x cos j, t> = cosh x sin y.
10-41 Show by differentiation that v = x 2 — y 2 + 2y is harmonic and deduce its
harmonic conjugate u. Express the function w = u + iv, and its derivative,
in terms of z.
10-42 Show by differentiation that u = cosh x cos y is harmonic and deduce its
harmonic conjugate v. Express the function w = u + iv, and its derivative,
in terms of z.
Section 10 5
10-43 Sketch the images in the >v-plane of the line y = 2x — 1 in the z-plane that
result from the mappings :
(a) w = iz — (2 + i) ;
(b) tv = 2z + 3;
(c) w = (l + 7)2+ 1.
10-44 Determine the images in the iv-plane of the circle | z — 1 | = 1 in the z-plane
that result from the mappings :
(a) w = Iz — i;
(b) w = (i - 1)2 + 2.
In each case shade the regions in the w-plane that correspond to the interior
of the circle | z — 1 | = 1.
10-45 Sketch the region in the iv-plane corresponding to the region x > 2, y < x in
the z-plane given that
w = (2i - X)z + (1 + /).
10-46 Determine the equation of the line in the tv-plane which is the image of the
line x = 1 in the z-plane under the mapping
w = z 3 .
10-47 Give an algebraic proof that if c =£ 0, then the general straight line y = tnx
+ c in the z-plane is mapped by the transformation w = \\z onto a circle
in the w-plane.
PROBLEMS / 491
10-48 Find the image in the w-plane of the circle \ z\ = 2 if
2z + ;'
w = -•
Z — 51
10-49 Show that w = e z maps the straight lines y = const in the z-plane onto
straight lines through the origin in the tv-plane, and the straight lines x =
const in the z-plane onto circles about the origin in the w-plane.
10-50 Locate the critical points of w = sin z and show that it maps the region
— Jw < x < Jtt, y > in the z-plane onto the upper-half of the H'-plane.
Section 10-6
y
z - plane
U
k
\*
Problem 10-51
10-51 Using the argument given in the text, show how the complex potential
F(z) = Uoz and the mapping w = z 1 ' 2 may be used to find the streamlines
indicated in the figure.
Find the speed of flow at a point P with coordinates (wo, vo) and determine
the streamline and the equipotential through P.
Scalars, vectors, and
fields
11-1 Curves in space
If the coordinates (x, y, z) of a point P in space are described by
*=/('), y=g(t), z = h(t), (l 11)
where/, g, h are continuous functions of t, then as t increases so the point P
moves in space tracing out some curve. It follows that Eqns (11-1) represent
a parametric description of a curve V in space and, furthermore, that they
define a direction along the curve V corresponding to the direction in which
P moves as 1 increases. For example, the parametric equations
x = 2 cos 2rrt, y = 2 sin 277?, z = 2t,
for < / < 1 describe one turn of a helix, as may be seen by noticing that
the projection of the point P on the (x, j)-plane traces one revolution of the
circle x 2 + y 2 = 4 as t increases from t = to t = 1, whilst the z-coordinate
of P steadily increases from z = to z = 2.
If we now denote by r the position vector OP of a point P on T relative
to the origin O of our coordinate system, and introduce the triad of ortho-
gonal unit vectors i, j, k used in Chapter 4, it follows that (Fig. 11-1)
r=m + g(t)i + h(t)k. (11-2)
Expressions of this form are called vector functions of one real variable,
in which the dependence on the parameter / is often displayed concisely by
writing r = r(f). The name vector function arises because r is certainly a
vector and, as it depends on the real independent variable t, it must also be a
function in the sense that to each / there corresponds a vector r(f). Knowledge
of the vector function r(/) implies knowledge of the three scalar functions
/, g, and h, and conversely.
The geometrical analogy used here to interpret a general vector function
r(r) is particularly valuable in dynamics where the point P(?) with position
vector r(/) usually represents a moving particle, and the curve Y its trajectory
in space. Under these conditions it is frequently most convenient if the
parameter t is identified with the time, though in some circumstances identi-
fication with the distance s to P measured along T from some fixed point on
T is preferable. Useful though these geometrical and dynamical analogies are,
we shall in the main use them only to help further our understanding of
general vector functions.
SEC 11-1
CURVES IN SPACE / 493
z = h(t)
*=m
Fig. 11-1 Vector function of one variable interpreted as a curve in space.
The name vector function suggests, correctly, that it is possible to give
satisfactory meanings to the terms limit, continuity, and derivative when
applied to r(t). As in the ordinary calculus, the key concept is that of a limit.
Intuitively the idea of a limit is clear: when we say u(?) tends to a limit v as
/ -»■ to, we mean that when t is close to to, the vector function u(/) is in some
sense close to the vector v. In what sense though can the two vectors u(t)
and v be said to be close to one another? Ultimately, all that is necessary is
to interpret this as meaning that | u(t) — v | is small.
So, we shall say that u(/) tends to the limit v as t ->■ to if, by taking /
sufficiently close to to, it is possible to make | u(/) — v | arbitrarily small. As
with our previous notion of continuity we shall then say that u(t) is continuous
at to if lim u(t) = v and, in addition, u(/o) = v. We incorporate these ideas
t-*t
into a formal definition as follows :
definition 111 (vector functions — limits and continuity) Let u(t) =
ui(t)i + «2(0j + «3(?)k and v = ni + ^2J + J>3k, then if for any e > there
is some number d such that
| u(0 — v | < e when \ t — t \ < d,
we shall say that u(7) tends to the limit v as t —*■ to, and write
lim u(f) = v.
494 / SGALARS, VECTORS AND FIELDS CH 11
If in addition u(/ ) = v, then u(Y) will be said to be continuous at / = to- A
vector function that is continuous at all points in the interval a<t<b will
be said to be continuous throughout that interval.
As usual, a vector function that is not continuous at t = to will be said
to be discontinuous. It is obvious from this definition that u(f) can only
tend to the limit v as t — »• to if the limit of each component of u(f) is equal to
the corresponding component of the vector v. Thus the limit of a vector
function of one variable is directly related to the limits of the three scalar
functions of one variable u\(t), m(t), and m{t). This is proved by writing
I u(0 - v | = [(«i(0 - i7i)2 + (w a (0 - f; 2 ) 2 + («s(0 - i*) 2 ] 1 ' 8 ,
showing that | u(t) — v | < e as t — *■ to is only possible if
lim (m(t) - v ( ) = for / =1,2, 3,
<— to
or
lim ui(t) = vi, lim u^it) = v 2, lim uz(t) = vz.
t-*tQ t-*t() t—*t()
A systematic application of these arguments enables the following theorem
to be proved.
theorem 11-1 (continuous vector functions) If the vector functions
u(0, v(0 are defined and continuous throughout the interval a<t<b, then
the vector functions a(t) + y(t), u(t) x v(f), and the scalar function u(0 . \(t)
are also defined and continuous throughout that same interval.
Example 11-1 At what points are the vector functions u(t), \(t) discontinuous
if
u(f) = sin ti + sec t\ H k,
v(f) = ti + (1 + t 2 )\ + e'k.
Verify by direct calculation that u(/) + v(f), u(?) . v(0, and u(/) x v(t) are
continuous functions in any interval not containing a point of discontinuity
of u(?) or v(t).
Solution The i component of u(f ) is defined and continuous for all t , whereas
the j component is discontinuous for / = (2n + 1)^77 with n = 0, ±1, ±2,
. . . and the k component is discontinuous for the single value / = 1. All
three components of v(0 are continuous for all t. We have by vector addition
u(0 + v(r) = (t + sin t)i + (1 + r 2 + sec t)\ + ( e* + ——r) K
SEC 11-1
CURVES IN SPACE / 495
showing that the components of u(?) + v(f) give rise to the same points of
discontinuity as the function u(t). We may thus conclude that the vector sum
is continuous throughout any interval not containing one of these points.
For example, u(0 + \(t) is continuous in both the open interval (\tt, 3tt/2)
and the closed interval [5, 7] but it is discontinuous in (0, 77).
The scalar product u(?) . v(f) is given by
at
u(r) . v(r) = t sin t + (1 + t 2 ) sec t +
0-1)
which is, of course, a scalar. Again we see by inspection that the scalar
product is continuous in any interval not containing a point of discontinuity
of u(r).
The vector product u(f) x \(t) is
u(r) x v(r) =
* J k
sin t sec t 1/0 — 1)
t 1 + t 2 e*
giving,
u(r) x v(/) = I e ( sec /
1 + fi
t- 1
+
t- 1
— e f sin / j
+ [(1 + t 2 )sint — ?sec?]k.
Here also inspection of the components shows that the vector product is
continuous in any interval not containing a point of discontinuity of u(r).
The following definition (interpreted later) shows that, as might be ex-
pected, the idea of a derivative can also be applied to vector functions of one
variable.
definition 11-2 (derivative of vector function) Let a(t) be a continuous
vector function throughout some interval a < t < b at each point of which
the limit
lim
A<—
u(f + Ar) - u(Q
At
is defined. Then u(/) is said to be differentiate throughout that interval with
the derivative
du u(t + A/) — u(/)
— = lim
dt A (-o A?
The geometrical interpretation of the derivative of a vector function of a
real variable is apparent in Fig. 11-2. In that figure the curve T is described
496 / SCALARS, VECTORS AND FIELDS
CH 11
u~M
r'«+ A'W rf „
,j&~
o
Fig. 11-2 Geometrical interpretation of du/dt.
by a point P(0 with position vector u(t) relative to O. The point denoted by
P'(t + At) is the position assumed by u at time t + At, so that OP = u(t),
OP' = u(; + At), and PP' = Au is the increment in u(?) consequent upon
the increment A? in t.
It is obvious that as At -*■ 0, so the vector Au tends to the line of the
tangent to the curve V at P(t) with Au being directed from P to P'. To inter-
pret du/dt in terms of components when u(0 = ui(t)i + W2WJ + "3(0k, we
need only observe that
— = lim
at a«— o
= lim
A(--0
u(t + At) — u(0
_
ui(t + AQ - ui(t) '
At
i + lim
A«->0
~u 2 (t + At) - m(t)'
J
+ lim
A«-»0
At
' u 3 (t + AQ - u 3 (t) ~
At
from which it follows that
du dwi . d«2 . d«3
d7 = "d7 1 + "d7 J+ d7
(11-3)
The unit vector T that is tangent to T at P(?) and points in the direction in
which P(0 will move with increasing t is obviously
T =
du
dt
du
d7
(11-4)
If 5 is the distance to P measured positively in the sense P to P' along F
from some fixed point on that curve (Fig. 11-2), then we know from our
work with differentials that d«i = u\dt, d«2 = u'2df, duz = u'zdt. Now as
the differentials d«i, d»2, d«3 are mutually orthogonal and represent the
increments in the coordinates [ui(t), mit), U3(t)] of P to an adjacent point
distant d* away along T with coordinates [ui(t + dt), m 2 (/ + dt), « 3 (f + dt)],
SEC 11-1
CURVES IN SPACE / 497
we may apply Pythagoras' theorem to obtain
(ds) 2 = OAd/) 2 + (u'odt) 2 + (u' 3 dt)\
whence
ds
dt
VdwA 2 /dw 2 \ 2 /dw 3 \ 2 l
\1t) + \~dt) + \d7J J'
Comparison of Eqns (11-3) and (11-5) then gives the result
du
d7
ds
d7
(11-5)
(116)
from which we see that if / is regarded as time, then the vector function
v = du/d? is the velocity vector of P(?) as it moves with speed dsjdt along T
in the direction of T. These results merit recording as a theorem.
theorem 11-2 Let u(t) = ui(t)i + u 2 (t)'} + uz(t)k be a differentiable
vector function of the real variable t, then
du d«i . du2 d«3
dt dt dt ' dt
If T denotes the curve traced out by the point P(r) with position vector u(7)
as t increases, and s is the distance to P(/) measured along Y from some fixed
point, then
d.?
dt
and the unit tangent T to the curve T at P(?) oriented in the sense of increasing
t is
du
T =
du\
dill
dt
As a consequence of this theorem we may write
du ds /du\
d; ~ d; \dt)
du
d7
= — T
dt '
(11-7)
which is a result of considerable use in dynamics when / is identified with
time.
Higher order derivatives such as d 2 u/d/ 2 and d 3 u/d/ 3 may also be defined
in the obvious fashion as d 2 u/d? 2 = (d/dO(du/d?), d 3 u/d/ 3 = (d/d0(d 2 u/d? 2 )
provided only that the components of u(f) have suitable differentiability
properties. Thus, for example, if the second derivatives of the components of
u(0 exist we have
d 2 u d 2 «i d 2 «2 d 2 «3
dt 2 dt 2
d/ 2
d? 2
(11-8)
498 / SCALARS, VECTORS AND FIELDS CH 11
We have seen that if t is identified with time and u(t) is the position vector
of a point P, then du/dr is the velocity vector of P. It follows from this same
argument that d 2 u/d? 2 is the acceleration vector of P.
Example 11-2 The position vector r of a particle at time t is given by
t = a cos mti + a sin cot) + oc? 2 k,
where i, j, k have their usual meanings and a, to, and a are constants. Find
the acceleration vector at time t, and deduce the times at which it will be
perpendicular to the position vector. Hence deduce the unit tangent to the
particle trajectory at these times.
Solution By making the identifications u = r, m(t) = a cos cot, w 2 (0 =
a sin cot and uz(t) = at 2 and then applying Theorem 11-2 we find that the
velocity vector is
dr
— = — aco sin coti + am cos mti + 2<xtk.
at
A further differentiation yields the required acceleration vector
d 2 r
— — = — aco 2 cos coti — am 2 sin cot\ + 2ak.
dt 2 J
Expressed vectorially, the condition that r and d 2 r/df 2 should be perpendicular
is simply that r . (d 2 r/d/ 2 ) = 0. Hence to find the time at which this condition
is satisfied we must solve the equation
(a cos coti + a sin mt\ + at 2 k) . (—am 2 cos mti — aco 2 sin cot] + 2ock) = 0.
Forming the required scalar product gives
— a 2 co 2 cos 2 mt — a 2 m 2 sin 2 mt + 2<x 2 t 2 =
which immediately simplifies to
a 2 u 2 = 2a 2 / 2 ,
showing that the desired times are
aco
a.\/2
To deduce the unit tangent T at these times we use the fact that
dr
T
where here
-®
dt
SEC 11-1 CURVES IN SPACE / 499
= V(« 2c ° 2 + 4t * 2 ' 2 )-
Denoting by T±, the unit tangent to the trajectory at t = ±amJa\/2, we find
by substitution of these values of / in the above expression that
1 / . aw 2 . am 2 \
and
1 / . aco 2
aw* . aw
2
■ i + cos — — j — \/2 k
t \/2 «V2 -
With the obvious differentiability requirements, if u(f) and v(/) are differ-
entiable vector functions with respect to t, then so also are u + v, u . v,
u x v, and <f>u, where <f> = <f>(t) is a scalar function of t. As the following
theorem is easily proved by resolution of the vector functions involved into
component form, it is stated without proof.
theorem 11-3 (differentiation, sums and products of vector functions) If
u(f) and v(/) are differentiate vector functions throughout some interval
a < t < b and cf>(t) is a differentiable scalar function throughout that same
interval then,
, , d , N du dv
, ^ d t s dv du
(c) d-r (u - v) = u 'd^ + d7- v;
, .s d , . dv du
(d)-(uxv) = ux-+-xv;
and, if c is a constant vector,
(e)ic = 0;
where the order of the vector products on the right-hand side of (d) must be
strictly observed.
When considering the geometry of twisted curves in space it is convenient
to identify points on a curve T by specifying their distance s measured along
the curve itself from some fixed point. This is of course equivalent to identi-
fying / with s in the position vector r(?) so that T is then defined as the locus
500 / SCALARS, VECTORS AND FIELDS CH 11
of the points having the equation r = r(s). This equation is called the intrinsic
equation of the curve T. In terms of the intrinsic equation it follows from
Eqn (11-7) that the unit tangent T to the curve T at r = r(s) is
T = -- (11-9)
Now although T is a vector function of s, it is also a unit vector, and so
T . T = 1 . Differentiating this scalar product with respect to s by means of
Theorem 11-3 (c) then gives
dT dT n
T + T .— =0
as as
or, as vectors in a scalar product commute,
T ■ — =0.
as
Hence, provided dT/ds =£ 0, the derivative of the unit tangent T with respect
to 5 is normal to T. Next, denoting by N the unit vector along dT/ds,
we define the essentially positive scalar function k = k(s) by means of the
equation
dT
— =/cN. (11-10)
ds
Here k is called the curvature of the curve at the point in question, and on
account of the relationship between T and N, the vector N is called the
principal unit normal to the curve Y at that point. As k is positive by definition
and N is a unit vector it follows from Eqn (11-10) that
dT
ds
(11-11)
It is convenient to define a third and mutually orthogonal unit vector B
called the unit binormal by means of the equation
B = TxN. (11-12)
The three unit vectors B, T, and N are, in general, all functions of s and they
serve as a specially useful triad of mutually orthogonal unit reference vectors
at points on the curve Y. It is important to appreciate that in general B, T,
N, and k vary from point to point on the curve Y, being always defined in
relation to the local properties of the curve in question. The positive number
p = l//c defined at each point of the curve Y is called the radius of curvature
of the curve at that point.
Example 11-3 Find B, T, N, and the scalars k, p for the curve defined
SEC 11-1 CURVES IN SPACE / 501
parametrically in terms of / by the expression
r = 2 cos (/ + /n)i — 2 sin (t + //)j + 4tk,
where /u is a constant. Hence deduce the values of these quantities at the point
on the curve corresponding to t = 0.
Solution First notice that t is not the arc length s along the curve, because
were this the case then it would follow that ds/dt = 1, whereas from Eqn
(11-5) we have
ds
— = V[4 cos 2 (r + /*) + 4 sin 2 (/ + /*)+ 16] = 2^5.
Now, using Eqn (11 -9) we have
dr dr dt
T =
ds dt ds
-o/a
whence
T
Thus
2V5 \dt)
1 d
T = 2V5 d? (2 C0S (t + ^ ~ 2 Sin ( ' + ^ + 4tk)
(-2 sin (f + fi)i — 2 cos (f + ^)j + 4k),
2V5
and so
T = ^- (sin (f + /u)i + cos (t + /j)\ - 2k).
Next, to find N and k we write Eqn (11-10) as
~ ds ~ dt 'ds~ \dt)l\di)'
Hence
= _J_ d_ / -sin (t + ju)i - cos (t + n)\ + 2k \
2V5drl V5 J
_ 1 / —cos (t + fi)i + sin (t + p)j \
Using «: = | dT/ds |, it then follows that
502 / SCALARS, VECTORS AND FIELDS CH 11
-cos (; + [i)i + sin (t + /u)\
l_
10
10
and, consequently, that
N = —cos (t + n)i + sin (t + /u)\.
Since the radius of curvature p is defined by the relationship p = \Jk, we have
p = 10.
Finally, using the definition B = T x N gives
■2k) x
X (-cos (t + fi)i + sin (t + ,u)j),
B = — (sin (t + fj)i + cos (t + [£)\ — 2k) x
whence
B = (2 sin (t + n)\ + 2 cos (/ + fi)j + k).
The point on the curve corresponding to t = is r(0) = 2 cos //i — 2 sin /*j,
and at this point:
T(0) = — (sin /x\ + cos fi\ — 2k),
VJ
N(0) = — cos jA + sin fi\,
B(0) = — (2 sin fii + 2 cos /*j + k).
The curvature k = 1/10 is independent of t, and so k is the same for all
points on the curve, as is the radius of curvature p = \Jk = 10.
Thus far we have defined the triad of unit vectors B, T, and N which serve
as a moving set of reference vectors along the curve Y. We have also cal-
culated the derivative dT/ds, and to complete our examination of these
vectors it only remains for us to find dB/ds and dN/ds. For our starting point
we take Eqn (11-12), which we differentiate with respect to s, using Theorem
11-3 (d), to obtain
dB dT _ , dN
— = — xN + Tx —
as as as
which, on account of Eqn (11-10), reduces to
dB dN
ds ds
Next, forming the vector product of this equation with N and expanding the
SEC 11-1 CURVES IN SPACE / 503
resulting triple vector product on the right-hand side gives
dB / dN\ dN
However as N is a unit vector it follows, as in the derivation of Eqn (11-10),
that N . (dN/ds) = 0, whilst the orthogonality of N and T implies that
N . T = 0. Thus,
dB
N x — = 0,
as
and hence the vectors N and dBjds must be parallel, differing only by a scalar
factor. This scalar factor is usually a function of 5 and it is called the torsion
of the curve T. Torsion is conventionally denoted by — t, so we can write
dB
— = -tN. (11-13)
ds
If required, the torsion t may be calculated by using the obvious result
dB
T =-N-— • (11-14)
as
See Problems 11-16 to 11-18 for an alternative treatment of the calculation
of p and t.
The manner of construction of B, T, and N is such that they form a right-
handed set in this order and, consequently,
B = TxN, T = NxB, N = BxT. (11-15)
This relationship is indicated in Fig. 1 1 -3 for a point P on the curve T.
To find dN/ds we differentiate the last result of Eqn (11-15) with respect
to s, and use Eqns (11-10), (11-13) together with the other results of Eqn
(11-15) to obtain,
dN dB ^ dT
— = — XT + Bx— = -tN X T + /cB X N,
ds ds ds
whence
dN
— =rB-*T. (11-16)
The study of the geometrical properties of space curves using the calculus
techniques is called the differential geometry of curves, and it has as its basis
the three equations
dT T dB dN
-dS = K ™> d7=" TN ' Ts =tB - kT > (1H7 >
504 / SCALARS, VECTORS AND FIELDS
CH 11
Fig. 11-3 Moving triad of reference vectors.
which are called the Serret-Frenet equations. Naturally, similar ideas lead
to the differential geometry of surfaces, though we shall make no further
use of such ideas in this first account of the subject.
Example 11-4 Find the torsion of the circular helix of Example 11-3.
Solution In the previous example it was shown that dsjdt = 1/(2 \/5) and
N = — cos (t + /x)i + sin {t + (i)\,
B = (2 sin (/ + (i)i + 2 cos (t + fi)\ + k).
Hence,
dB /dB\ l/ds\ 1 , . ,
d7 = (d7)/(d7J = 5 (cos( ' + ^ ), - sin(/ + ^ ) -
An application of Eqn (11 14) gives
t = — J [- cos (t + ,a)i + sin (t + fj,)\] . [cos {t + fi)i — sin (t + /j)\] = ^.
This result might have been anticipated, for the circular helix in question is
similar to a screw thread with a constant pitch, and consequently its curvature
and twist properties must be the same at all points.
11-2 Antiderivatives and integrals of vector functions
The notion of an antiderivative, already encountered in Chapter 8, extends
SEC 11-2 ANTIDERIVATIVES AND INTEGRALS OF VECTOR FUNCTIONS / 505
naturally to a vector function of a real variable.
definition 11-3 (antiderivative — vector function) The vector function
F(0 of the real variable t will be said to be the antiderivative of the vector
function f(<) if
Naturally, an antiderivative F(t) is indeterminate so far as an additive
arbitrary constant vector C is concerned, because by Theorem 11-3 (e),
dC/dt = 0. Continuing the convention adopted in Chapter 8, the operation
of antidifferentiation with respect to a vector function of the single real
variable t will be denoted by J, so that
Jf(/)df.= F(/) + C, (11-18)
where C is an arbitrary constant vector.
It is obvious that Eqn (11-18), when taken in conjunction with Theorem
11-2, implies the following result.
theorem 11-4 (antiderivative of vector function) If
ff(0d/ = F(0 + C,
where f(/) = /i(f)i +/ 2 (0j + fa(t)k, F(f) = Fx{t)i + F 2 (t)\ + F 3 (t)k and C
= Cii + C2J + C3I1 is an arbitrary constant vector, then
SMQdt = Fi(t) + C it i = 1, 2, 3
with
IT =m -
Expressed in words, the antiderivative of f(/) has components equal to the
antiderivatives of the components of f(t). As with the scalar case, in many
books the entire right-hand side of Eqn (11-18) is loosely referred to as the
indefinite integral of the vector function f(0, rather than as here using this
term to refer only to its first member.
Example 11-5 Find the antiderivative of f(?) given that
f(0 = cos ti + (1 + * 2 )j + e-«k.
Solution It follows immediately from Theorem 1 1 -4 that,
J f(t)dt = i J cos / dt + j J (1 + t z )dt + k J e-« dt
sin ti + 1 1 + - j j - e- ( k + C.
506 / SCALARS, VECTORS AND FIELDS CH 11
The obvious modification to Theorem 11-4 to enable us to work with
definite integrals of vector functions of a single real variable comprises the
next theorem. Because it is strictly analogous to the scalar case it is offered
without proof.
theorem 11-5 (definite integral of vector function) If F(/) is an anti-
derivative of f(t), then
■b
f{t)dt = F(b) - F(a).
r
Example 11-6 Evaluate the definite integral
'It
(t 2 \ + sec 2 t\ + k)df.
Jo
Solution From Theorem 11-5 we have the result
(t 2 i + sec 2 t\ + k)dt = ( '- i + tan ?j + k?|
= T^i + J + i-k.
A slightly more interesting application of a definite integral is provided
by the following example concerning the motion of a particle in space.
Example 11-7 A point moving in space has acceleration
sin 2ti — cos 2tk.
Find the equation of its path if it passes through the point with position
vector ro = j + 2k with velocity 2j at time t = 0.
Solution If r is the general position vector of the point at time t, then the
velocity v(t) = di/dt and the acceleration a(f) = d 2 r/d/ 2 . Hence
d 2 r
— = sin 2ti — cos 2/k,
at*
so that integrating the acceleration equation from to t and replacing t in the
integrand by the dummy variable t gives
(jl) dT= (sin 2ri — cos 2Tk)dT
Hence
(
dr)
= — £(cos 2ri + sin 2rk)
o
and so
SEC 11-2 ANTIDERIVATIVES AND INTEGRALS OF VECTOR FUNCTIONS / 507
v(0 = vo+ 1(1 - cos 2t)i ~ I sin 2tk.
Now from the initial conditions of the problem vo = 2j, so that the velocity
equation becomes
v(0 = 1(1 - cos 2t)\ + 2j — \ sin 2/k.
To find the equation of the path a further integration is required so, setting
v(/) = dr/df, integrating the velocity equation from to / gives
|_£ j dr = j (i(l - cos 2r)i + 2j - \ sin 2rk)dr.
Hence
= iO - \ sin 2r)i + 2rj + \ cos 2rk)
r(r)
and so
r(/) = r + \{t - \ sin 2/)i + 2/j + J(cos 2/ - l)k.
Again appealing to the initial conditions of the problem we find that ro =
j + 2k, so that, finally, the particle path must be
r(0 = 1(7 - \ sin 2t)\ + (1 + 2t)\ + \{1 + cos 2f)k.
The form of definite integral of a vector function so far considered is
itself a vector. We now discuss one final generalization of the notion of a
definite integral involving a vector function that generates a scalar.
Let a curve T denned parametrically in terms of the arc length s have the
general position vector r = i(s) and unit tangent vector T(s), and let F(s)
be a vector function of s. Then at any point of T the scalar function <f>(s) =
F(s) . T(s) represents the component of F(s) tangential to T. If the scalar
function <f>(s) is then integrated from s = a to s = b, this is obviously equiva-
lent to integrating the tangential component of F(s) along T from the point
r = r(a) to the point r = r(b). An integral of this form is therefore called
either a line integral or a curvilinear integral of the vector function F(s) taken
along the curve T, which is sometimes referred to as the path of integration.
definition 1 1 -4 (line integral of vector function) The line integral of the
vector function F(s) taken along the curve T between the points A and B
with position vectors r = r(a) and r = t{b), respectively, is the quantity
rb /»«
J= </>(s)ds= F.Tds,
Ja Ja
where <f>(s) = F(s) . T(s), s denotes arc length along T, and T(j) is the unit
tangent vector to I\
508 / SCALARS, VECTORS AND FIELDS
CH 11
In terms of the general position vector r of a point on the curve and the
fact that s is the arc length along V, we obviously have the relationship
dr = T ds, so that the line integral may also be written
-r
F.dr
or, more simply still if T denotes part of a curve, as
/-JY*.
In component form, setting the differential dr = dxi + dy\ + dzk and
F = Fii + Fz\ + F3IC, we have at once
j F . dr = f Fi dx + F 2 dy + F 3 dz.
(11-19)
If desired, the line integral (11 19) may be defined vectorially in terms of
the limit of a sum in a manner strictly analogous to the definition of an
ordinary definite integral. To achieve this, let the interval a < s < b be
divided into n sub-intervals Si-i<s<su with i = 1, 2, . . ., n, where
so = a and s n = b. Then setting dr* = r(s e ) — r(^-i) as in Fig. 11-4, the line
Fig. 11-4 Line integral of F along F.
SEC 11-3 SOME APPLICATIONS / 509
integral (11-19) may be approximated by the sum
J n = 2 F(j<) . diy. (11-20)
If the number of sub-divisions n is now allowed to tend to infinity in such a
manner that the lengths of all the sub-divisions tend to zero then, as with an
ordinary definite integral, we arrive at the result
'II n
F.dr = lim £ F(^) • dr«. (11-21)
r
n— »oo i=l
When used in this context, the differential dr« is usually called a line element
of the curve T joining A to B.
Example 11-8 Evaluate the line integral
F.dr,
I
given that F = yz\ + xz] + 2xyk and T is that part of the circular helix
x = a cos t, y = a sin t, z = kt that corresponds to the interval < t < 2tt.
Solution First we use Eqn (11-19) to write the line integral as
F . dr = yz dx + 2xz dy + xy dz.
Now along the path V we have the relationships
x = a cos t, y = a sin t, z = kt
which imply the differential relationships
dx = —a sin t dt, dy = a cos t dt, dz = k dt.
Hence
/-'•*-!!
2tt
(—a 2 kt sin 2 / + 2a 2 kt cos 2 1 + a 2 sin / cos t)dt
„, T* 2 /sin 2* cos2f| 2 "
+
= aWk.
t 2 t sin 2t
+
4 4
cos 2f
■-|2l7
cos 2/
11-3 Some applications
Kinematics, an important branch of mechanics, is essentially concerned with
the geometrical aspect of the motion of particles along curves. Of particular
510 / SCALARS, VECTORS AND FIELDS
CH 11
Fig. 11-5 Planar motion of particle in terms of polar coordinates.
importance is that class of motions that occur entirely in one plane, and so
are called planar motions. In many of these situations, for example, particle
motion in an orbit, the position of a particle is best denned in terms of the
polar coordinates (r, d) in the plane of the motion. Let us then determine
expressions for the velocity and acceleration of a particle in terms of polar
coordinates.
We first appeal to Fig. 11-5, which represents a particle P moving in the
indicated direction along the curve I\ The unit vectors R, are normal to
each other and are such that R is directed from O to P along the radius
vector OP, and points in the direction of increasing 6. Then clearly R and
are vector functions of the single variable d, with
R = cos 0i + sin 6\
and
= — sin 0i + cos 0j.
It follows from these relationships that
dR
d0
=
and
d0
= -R.
(11-22)
(11-23)
In terms of the unit vectors R, the point P has the position vector
r = /-R, (11-24)
so that the velocity drjdt must be
dr dr dR
dt dt dt
dr dR dd
-dt* + r Td~d?
showing that the velocity vector of P is
T = rR + r60, (11-25)
SEC 11-3 SOME APPLICATIONS / 511
where differentiation with respect to time has been denoted by a dot.
Here the quantity f is called the radial component of velocity and r& is
called the transverse component of velocity. A further differentiation with
respect to time yields for the acceleration vector f = d 2 r/df 2 the expression
r = FR + rR + rOQ + rdQ + rtiQ
or
t = rR + f6^ + (rd + r0)0 + r6 2 ~
do do
Hence by Eqn ( 1 1 -23) this is seen to be equivalent to
r = (r - r0 2 )R + (2r6 + rd)Q. (1 1-26)
The quantity f — rd 2 is called the radial component of acceleration, and
2rfi + rd is called the transverse component of acceleration.
Example 11-9 A particle is constrained to move with constant speed v
along the cardioid r = a(l + cos 6). Prove that
v = 2a$cos (-)»
and show that the radial component of the acceleration is constant.
Solution From Eqn (11-25) and the expression r = a{\ + cos 6), it follows
that the velocity vector r is given by
r= -a sin 6&R + o(l + cos 0)00.
Now as v 2 = i -2 = r . r, we have
v 2 = a 2 & 2 sin 2 6 + a 2 6 2 (l + cos 0) 2 = 2a 2 6 2 (l + cos 0).
Using the identity 1 + cos = 2 cos 2 (0/2) in this expression and taking the
square root yields the required result
v = 2a0 cos (0/2).
To complete the problem we now make appeal to the fact that the radial
acceleration component is f — r6 2 , whilst by supposition v = constant.
From our previous working we know that
v 2 = 2a 2 2 (l + cos 0),
so that differentiating with respect to t and cancelling 6 gives
.. _ 2 sin
~ 2(1 + cos 0)
512 / SCALARS, VECTORS AND FIELDS CH 11
or,
as
6 2 -.
V 2
2a 2 (l + cos
e)
=
v 2 sin
4a 2 (l + cos
0)2
Hence as f = —a(cos 00 2 + sin 66), substituting for r, & 2 , and 6 in the radial
component of acceleration we find, as required, that
— 3v 2
r — r& 2 = — - — = constant.
4a
A vector treatment of particle dynamics follows quite naturally from the
ideas presented so far. Thus a particle of variable mass m moving with velocity
v has, by definition, the linear momentum M, where
M = my.
Now by Newton's second law of motion we know that, with a suitable choice
of units, we may equate the force F to the rate of change of momentum, so
it follows that we may write
., dM
However,
dM dm dv
— - = — v + m —
At At At
and hence
Am dv
F = -y + m-. (11-27)
In the case of a particle of constant mass m, we have Am/At = 0, reducing
Eqn (11-27) to the familiar equation of motion
F = wa, (11-28)
where a = Av/At is the acceleration.
Similarly, the angular momentum of a particle of fixed m»ss m about the
origin is defined by the relation SI = r x my, where r is the position vector
of the particle relative to the origin and v == At/At is its velocity. Then the
rate of change of angular momentum about the origin is
ASl dv
— — - = my X v + mr x —
At At
= rxF, (11-29)
SEC 11-3 SOME APPLICATIONS / 513
by virtue of Eqn (11-28). This is the vector form of the principle of angular
momentum, which asserts that the rate of change of angular momentum
about the origin is equal to the moment about the origin of the force acting
on the particle.
The line integral
■'-//•*
also occurs naturally in many contexts, perhaps the simplest of which is in
connection with the work done by a force. If F is identified with a force, and
dr is a displacement along some specific curve T joining points A and B, then
/ represents the work done by the varying force F as it moves its point of
application along the curve V from A to B (cf. Fig. 1 1 -4).
In the special case that F is a constant force and T is a straight line segment
with end points at s = a and s = b this simplifies to an already familiar
result. Suppose that F = F a and dr = dsp, where a, (3 are constant unit
vectors inclined at an angle 6, then
J = F . dr = F(tt . p) ds
J.i Ja
= F(b — a) cos 6.
Thus, as would be expected in these circumstances, the work done by F is
the product of the component F cos 6 of the force F along the line of motion
and the total displacement (b — a).
The line integral also occurs in fields other than particle dynamics, and in
fluid mechanics for example, if F is identified with the fluid velocity v and T
is some closed curve drawn in the fluid, then the scalar quantity y defined by
the line integral
y
-jl..*
is called the circulation around the curve T. In more advanced works it is
shown that y provides a measure of the degree of rotational motion present
in a fluid. For a special class of fluid flows known as potential flows the cir-
culation is everywhere zero, irrespective of the choice of T. These flows are
said to be irrotational and are of fundamental importance. Line integrals
around closed curves are generally denoted by the symbol § with the conven-
tion that the path of integration is taken anti-clockwise, so that for the
circulation y we would write
y = (b v . dr.
A reversal of the direction of integration around T would change the sign
of y.
514 / SCALARS, VECTORS AND FIELDS
CH 11
An exactly similar application of the line integral occurs in electromagnetic
theory, where the electromotive force (e.m.f.) between the ends A and B of a
wire coinciding with a curve T is related to the electric field vector E by the
line integral
e.m.f.
=r
E.dr.
Example 11-10 Find the work done by a force F = yz\ + xj + xzk in
moving its point of application along the curve Y defined by x = t, y = t 2 ,
z = fi from the point with parameter t = 1 to the point with parameter
t = 2.
Solution
Work done = F . dr = (yzi + xj + xzk) . (dxi + dyj + dzk)
1
= yz dx + x dy + xz dz.
Now as x = t, y = t 2 , z = t 3 , it follows that
dx = dt, dy = It dt, dz = 3t 2 dt
and so, substituting in the above expression, we find
Work done = | (4? 5 + 2t 2 )dt = 140/3 units.
Example 11-11 If the fluid velocity v = x 2 y\, determine the circulation
y of v around the contour Y comprising the boundary of the rectangle
x = ±a, y = ±b.
^y
R
r
b
A
o
-<
^^^B
1*
—a
O
fl*
S
F
-b
p
SEC 11-4 FIELDS, GRADIENT, AND DIRECTIONAL DERIVATIVE / 515
Solution By definition, the circulation y is
y = (j> v . dr = o x 2 y\ . (dxi + dvj + dzk)
Jr Jr
= (p x 2 ^ dx,
where the direction of integration is anti-clockwise around T. Now the line
integral around T may be represented as the sum of four integrals as follows,
rQ rR rs rp
y = x 2 y dx + x 2 y dx + x 2 y dx + \ x 2 y dx,
where the limits refer to the corners of the rectangle in Fig. 11-6.
The first and third integrals vanish since x is constant along PQ and RS,
with the consequence that dx = 0. Along QR, y = b and along SP y = —b,
so that
J" -a fa
bx 2 dx + —bx 2 dx =
a J — a
-4a 3 b
11-4 Fields, gradient, and directional derivative
The scalar function <f> = VO - x 2 ) + V(l - J 2 ) + V(l - z 2 ) is defined
within and on the cube shaped domain |x|<l, | / | < 1, | z | <; 1 and
assigns a specific number <£ to every point within that region. In the language
of vector analysis, <f> is said to define a scalar field throughout the cube. In
general, any scalar function <£ of position will define a scalar field within its
domain of definition. A typical physical example of a scalar field is provided
by the temperature at each point of a body.
Similarly, if F is a vector function of position, we say that F defines a
vector field throughout its domain of definition in the sense that it assigns a
specific vector to each point. Thus the vector function F = sin xi + xyj +
je z k defines a vector field throughout all space.
As heat flows in the direction of decreasing temperature, it follows that
associated with the scalar temperature field within a body there must also be
a vector field which assigns to each point a vector describing the direction and
maximum rate of flow of heat. Other physical examples of vector fields are
provided by the velocity field v throughout a fluid, and the magnetic field H
throughout a region.
To examine more closely the nature of a scalar field, and to see one way
in which a special type of vector field arises, we must now define what is
called the gradient of a scalar function. This is a vector differentiation
operation that associates a vector field with every continuously differentiable
scalar function.
516 / SCALARS, VECTORS AND FIELDS CH 11
definition 11-5 (gradient of scalar function) If the scalar function
<j>(x, y, z) is a continuously differentiable function with respect to the inde-
pendent variables x, y, and z then the gradient of <^, written grad <f>, is defined
to be the vector
, . 8<h . 8<J> 8J>
grad<£ = -ri + -rj + -^k.
ox 8y 8z
For the moment let it be understood that r = xi + y\ + zk is a specific
point, and consider a displacement from it dr = dxi + dy\ + dzk. Then it
follows from the definition of grad <j> that
8<t> 8<h 8d>
dr . grad <£ = / dx + -£■ dy + ■£■ dz,
ox oy 8z
in which it is supposed that grad <f> is evaluated at r = xi + y\ + zk. Theorem
5T9 then asserts that the right-hand side of this expression is simply the total
differential d<f> of the scalar function <f>, so that we have the result
d<£ = dr . grad <£. (11-27)
If we set ds = | dr |, then dr/ds is the unit vector in the direction of dr.
Writing a = dr/ds, Eqn (11-27) is thus seen to be equivalent to
^ = a.gradf " (11-28)
ds
Because a . grad <f> is the projection of grad <f> along the unit vector a, expres-
sion (11-28) is called the directional derivative of <f> in the direction of a.
In other words, a . grad <f> is the rate of change of <f> with respect to distance
measured in the direction of a. We have already utilized the notion of a direc-
tional derivative in connection with the derivation of the Cauchy-Riemann
equations, though at that time neither the term nor vector notation was
employed.
As the largest value of the projection a . grad <j> at a point occurs when a
is taken in the same direction as grad <f>, it follows that grad (f> points in the
direction in which the maximum change of the directional derivative of <f>
occurs.
In more advanced treatments of the gradient operator it is this last
property that is used to define grad <j>, since it is essentially independent of
the coordinate system that is utilized. From this more general point of view
our Definition 11-5 then becomes the interpretation of grad^ in terms of
rectangular Cartesian coordinates.
The vector differential operator V, pronounced either 'del' or 'nabla', is
defined in terms of rectangular Cartesian coordinates as
8 d d
8x 8y oz
SEC 11-4 FIELDS, GRADIENT, AND DIRECTIONAL DERIVATIVE / 517
As the name implies, V is a vector differential operator, not a vector. It
only generates a vector when it acts on a suitably differentiable scalar func-
tion. We have the obvious result that
J , 86 . 86 . 86 1.8 8 8\ ,
grad ^ s _ 1+ _ J + _| k= ^_ +j _ + k _^ sV ^ (U . 30)
Example 11-12 Determine grad 6 if 6 = z 2 cos (xy — Jtt), and hence
deduce its value at the point (1, \-n , 1).
Solution We have
dx -yz* sin (xy - \rr), — = -xz 2 sin (xy - \tt)
and
8z
Hence
- = 2z cos (xy - Jtt).
<3<i d<£ d<£
= -jz 2 sin (xy - \n)\ — xz 2 sin (xy - Jtt)]' + 2z cos (xy - £ir)k.
At the point (l, \tt, l) we thus have
(grad 6) a< K „ = — (_(| CT )i - j + 2k).
Example 11-13 If r = xl + y\ + zk, and r = \ r |, deduce the form taken
by grad r n .
Solution As r = (x 2 + j 2 + z 2 ) 1 / 2 , it follows from Eqn (11-30) and the
chain rule that
pad,-- (,-+,- +k-)r-
(.dr 8 8r 8 dr 8\
\ 8x dr 8y dr 8z dr)
, (dr 8r dr \
-^fe' + ^ + SV
However,
dr x dr y dr z
dx r 8y r dz r
518 / SCALARS, VECTORS AND FIELDS CH 11
and so
grad r n = nr n ~ 2 (xi + y] + zk) = nr n ~ 2 T.
The following theorem is an immediate consequence of the definition of
the gradient operator and of the operation of partial differentiation.
theorem 11-6 (properties of gradient operator) If <f> and tp are two con-
tinuously differentiable scalar functions in some domain D, and a, b are
scalar constants, then
(a) grad a = 0;
(b) grad {a<j> + tip) = a grad <j> 4- b grad rp;
(c) grad (<f> xp) = <f> grad y> + y> grad <f>.
The surfaces <f>(x, y, z) = constant associated with a scalar function <f> are
called level surfaces of<f>. If we form the total differential of j> at a point on a
specific level surface <f> = constant then &<f> = and, as in Eqn (5-23), we
obtain the result
d ± Ax + d A dy + d ± Az = o.
ox oy dz
This is equivalent to
dr. grad ^ = 0, (11-31)
where now dr is constrained to lie in the level surface.
This vector condition shows that grad <j> must be normal to dr, and as dr
is constrained to be an arbitrary tangential vector to the level surface at the
point in question, it follows that the vector grad </> must be normal to the level
surface. The unit normal n to the surface is thus n = grad^/| grad<£ |.
Notice that this normal is unique apart from its sign. This simple argument
has proved the following general result.
theorem 11-7 (normal to level surface) If <j> is a continuously differentiable
scalar function, the unit normal n to any point of the level surface <f> = con-
stant is determined by
grad <£
n =
grad<£
Example 11-14 If <f> = x 2 + 3xy 2 -f- yz z — \2, find the unit normal n to
the level curve <p = 3 at the point (1, 2, 1). Deduce the equation of the
tangent plane to the level surface at this point.
Solution The level surface <f> = 3 is defined by the equation %p = 0, where
y> = x 2 + 3xy 2 + yz 3 — 15 = 0.
SEC 11-4 FIELDS, GRADIENT, AND DIRECTIONAL DERIVATIVE / 519
Hence
grad y> = (2x + 3/ 2 )i + (6xy + z 3 )j + 3yz 2 k
which, at (1, 2, 1), becomes
(grad v)(i,2,i) = 14i + 13j + 6k.
As xp = is the desired level surface, it follows from Theorem 11-7 that the
unit normal to this surface at the point (1, 2, 1) must be,
141 + 13j + 6k 141 + 13j + 6k
n =
V[(14) 2 + (13)2 + (Q2] ^401
Now the equation of a plane is n . r = p, where r = xi + y\ + zk is a
general point on the plane, n is the unit normal to the plane, and p is its
perpendicular distance from the origin. The point ro = i + 2j + k is a point
on the plane so that n . r = n . ro (=/?). Hence
/14i + 13j + 6k\ /14i+13j + 6k\ ,. „. 1N
showing that the required equation is
14* + 13j + 6z = 46.
We have seen how the gradient operator associates a vector field grad <f>
with every continuously differentiable scalar field <f>. Any vector field F =
grad <£ which is expressible as the gradient of a scalar field <f> is called a
conservative vector field, and <f> is then referred to as the scalar potential
associated with the vector field.
This has an important implication when line integrals involving con-
servative vector fields are considered. Let us suppose that F = grad <j>, and
that
-L
JA
Then
B
F.dr. (11-32)
J= J grad^.dr, (11-33)
and by virtue of Eqn (11-27) this can be written
J = f d<t> = «£(B) - #A). (11-34)
Hence when F belongs to a conservative vector field, results (1 1 -32) and (11-34)
show that the line integral J of F depends only on the end points of the path
of integration, and not on the path itself.
This fundamental result has far reaching consequences and forms the
520 / SCALARS, VECTORS AND FIELDS CH 11
basis of many important developments, of which gravitational potential
theory is but one. Suppose, for example, that F is identified with a conserva-
tive force field, then result (11-34) represents the change in the potential
energy of a particle as it moves from point A to point B. That /depends only
on the difference <f>(B) and <£(A) and not on the path joining A to B explains
why, when using potential energy considerations in mechanics, no considera-
tion need be given to the path that is followed.
11-5 An application to fluid mechanics
Let the velocity field v in a fluid as a function of position (x, y, z) and the
time t be denoted by
V = Vj\ + V2] + v$l, (11-35)
where v t = v ( (x,y, z, t) for / = 1, 2, 3. Clearly, if at any fixed time t = h,
dr denotes a differential displacement along the line of flow at the point with
position vector r = r(x, y, z, h) then dr must be parallel to the velocity
vector v at that point. Hence the respective components of dr and v must be
proportional. The lines determined in this manner, which are everywhere
tangential to the velocity field vector, are called the streamlines of the flow
field. More properly these should be called stream surfaces since in three
space dimensions they correspond to surfaces. In the case of a general vector
field F, not necessarily defining a velocity field, they are called field lines.
The condition that dr must be parallel to v implies that the field lines or
streamlines must satisfy the equations
^f = ^ = ^. (11-36)
VI V2 Vz
Equations of this form are called differential equations, and methods for
their solution will be explored systematically in the last three chapters.
If, now, r is the position vector of a fluid particle at time t, we have the
obvious vector equation
^ = v, (11-37)
dt
which implies the three scalar differential equations
dx dy dz . . .
T- = t, i» -£ = v *> -r t =V3 - (11-38)
dt dt dt
Together, the solutions of these last three equations define curves called the
particle trajectories. The particle trajectories are functions of the time, and
are so named because they describe the path followed by individual fluid
particles, as they pass through the flow field.
SEC 1 1-5 AN APPLICATION TO FLUID MECHANICS / 521
Example 11-15 Find the differential equations of the streamlines and
particle trajectories corresponding to the fluid velocity field v = 2t 2 xi +
ty\ + 3z 2 k.
Solution In this case vi = 2fix, v 2 = ty, and v 3 = 3z 2 , showing that the
differential equations describing the streamlines are
dx dy dz
2^ = 7y" = 3z" 2 (Wth ' = constant )>
whilst the differential equations describing the particle trajectories are
dx dy dz
5F- 2 "*' 37"* a?- 3A
In this case all of these differential equations are of the type called variables
separable which may be solved by direct integration after some slight re-
arrangement. Although the main discussion of the solution of differential
equations will be postponed until the final chapters of this book, it will be
instructive to solve the ones that have arisen in connection with this problem.
The differential equations defining the streamlines are equivalent to two
different relationships between the three space variables x, y, and z. We
choose to work with the first and last pairs of the equations which are,
respectively, equivalent to
dx o, d y a A y * dz
— = 2t— and — = ,
x y y 3z 2
with t regarded as a constant. Taking the antiderivatives of these gives
log x = 2?{log y + constant} and log y = — + constant.
3z
Re-arrangement shows that the streamlines or stream surfaces are described
by the equations
x = (Ciy) 2 ', y = eWSe-'/fc
where d, C 2 are arbitrary constants. If flow in the plane z = z is considered,
then these equations define a curve that is correctly called a streamline. It
would be the curve
x = Ci 2 *e 2C,2(2/3 e _2(2/3zo , y = e^V" 3 * .
The particle trajectories are found in similar fashion by finding the
antiderivatives of
d * -. o . dy dz
— =2fidt, -i + /d/, -«3dr.
x y z 2
522 '/ SCALARS, VECTORS AND FIELDS CH 11
Hence
2 t 2
log x = - t 3 + constant, log/ = — + constant,
1
— 3t + constant,
z
showing that
x = c 3 e 2(3 ' 3 , 7 = C^\ z = ^-^
where Cz, d, and C5 are arbitrary constants. The position vector of a
particle must thus be
r=C 3 e 2(3 ' 3 i + C 4 e (2 / 2 j + — 5— -k.
C5 — it
PROBLEMS
Section 111
11-1 Sketch and give a brief description of the curves described by the following
vector functions of a single real variable t :
(a) r = a cos 2nti + b sin 2nt] + tk;
(b) r = a cos 2w/i + b sin 2rtj + t 2 k;
(c) r = d + t 2 ) + t 3 k.
11-2 State which of the following vector functions are everywhere continuous and,
if they have points of discontinuity, where these occur:
( a > u w=(rr^) i+ (r^)j + ' 2k;
(b) U (o = (7-Z--2) * + tan 'i + ?e ~' k ;
(c) u(f) = tanh M + cosh t\ + t sinh tk;
(d)u(0= (^-L^-X i + I sin r| j + 3k.
11-3 A vector function u(f) of a real variable t may be assigned left- and right-hand
limits uOo— ) = Hm u(/) and u(/o+) = lim u(0 with respect to the point to
t—>to — t-+to +
in an obvious manner. The vector function u(f) will be continuous at t = to
if u(/o— ) = u(?o+). Use this concept to determine which of the following
vector functions are continuous at the stated points :
, .. , . sinh 2t . ,. , ,
(a) u(0 = i + /e'j + cosh tk at t = 0;
(fn _ a n\
1 i + cosh t\ + tanh tk at t = a.
PROBLEMS / 523
11-4 Form the vector functions u(f) + v(/), u(0 x v(t), and the scalar function
u(0 . v(r) given that:
„(,) = ,2 i + sinhfj+ jj_lfj
and
\(t) = 2ti + cosh t\ + sin tk.
11-5 Determine du/d? and d 2 u/d/ 2 for the vectors u defined in (a) and (c) of
Problem 111, and find du/d/ for the vector
r , d 2 r
u = - + (a . r)b + a x — ,
where r = t(0, r = | r [ and a, b are constant vectors.
11-6 The position vector of a particle at time t is
r = cos (/ - l)i + sinh (t - l)j + <xt s k.
Find the condition imposed on a by requiring that at time t = 1 the accelera-
tion vector is normal to the position vector.
11-7 Find the unit tangent T to the curve
r = tl + t 2 } + t 3 k
at the points corresponding to t — and t = 1.
11-8 Prove results (a) to (c) of Theorem 11-3.
11-9 Prove result (d) of Theorem 11-3: (a) by expansion of the vector product
u x v followed by subsequent recombination of the results; and (b) using
determinants.
11-10 Find B, T, and N when t = Jw, given that
r = (cos t + sin 2 t)i + sin t(\ — cos t)\ — cos tk.
11-11 Find B, T, N, k, and t for the helix
r = (1 — cos t)\ + sin t] + tk,
when t = 1 77.
11-12 Prove that if r(t) is a suitably differentiable function of the real variable /,
and s is the arc length along the parametrically defined curve r = r(f), then
with the usual notation
dr _ dy
df dt '
dr 2
Hence show that
dr djr
At X dt 2
dr 2T + K [dt) N -
t
•a) «•
and deduce that
524 / SCALARS, VECTORS AND FIELDS
CH 11
dr d 2 r
dr
d 2 r
dt * dt 2
1
dt'~ dr 2
dr d 2 r
o
dr
3
dr * dt 2
dt
11-13 Apply the results of Problem 11-12 to deduce B, T, and N for the curve
r = ti + t 2 \ + 1 3 k
when f = 1.
11-14 This problem gives an elementary derivation of the radius and circle of
curvature for a plane curve. If at a point (f, jj) on a curve j = /(x), a circle
of radius p and centre (a, p) is tangent to the curve and has the same second
derivative, then it is called the circle of curvature at (£, rj). The number p is
called the radius of curvature and (a, /?) the centre of curvature.
Let the circle of curvature at (f, j?) have the equation (X — a) 2 +
(7 - ft 2 = p 1 , where (Z, Y) is a general point on the circle (see figure).
By differentiation of this equation with respect to X, and using the tangency
condition dYldX = /'(!) at (I, »?)> show that
(f _ «) + fo - fl/XD = 0.
By a further differentiation of the equation with respect to X, and by using
the equality of second derivatives d 2 Y\dX 2 =/"(£) at (I, ??), show that
1 + (/'(f)) 2 + fo - «/"(?) = o.
Use the fact that (f, jj) lies o/j the circle of curvature to deduce that:
a= l--^|(l+(/'(f)) 2 )
PROBLEMS / 525
and that
__ CI + (AfflT"
' LTWl
Find the centre and radius of curvature of the curve y = 1 + x 2 at the
point (1, 1).
11-15 Use the results of Problem 11-12 to show that the circular helix
r = a cos ti + a sin t\ + btk
has the constant radius of curvature p = (a 2 + b 2 )\a.
11-16 Show from the Serret-Frenet equations and the fact that T = drjds, that
the torsion t may be expressed in the form :
dr
As
"djr dV
ds 2 ds 3
©"
11-17 If r = r(f), where the parameter t is not the arc length along the curve,
prove that
dr _ dr ds
"di ~~ ds dt'
d^r _ djr /ds\ 2 dr dJ£
dr 2 ~ d5 2 \dt) + dsdt 2 '
ds\ 3 d 2 r ds d 2 s dr d 3 j
dl 2 di di 2 + ds df 3 "
_ djr /ds\ 2 dr
~~ d5 2 [dtj + ds
dh _ dfr /ds\
dr 3 - di 3 \df j
nee show that
dr /&r djiA = dr /dh dh\ /ds\
dt ' [dt 2 * dr 3 ) ~~ ds ' [ds 2 X ds 3 J [dtj
dr
d/
"djr dV
ds 2 di 3
, that
dr
dr '
'djr dV
df 2 X dr 3
Use the result of Problem 11-12 to deduce that
and k = — =
p
dr
[~d 2 r d
3 rl
dr
dr 2 X dr 3
"dr d 2 r~
dr X dt 2
2
dr
d 2 r
dr * dr 2
dr
3
dr
11-18 Apply the result of Problem 11-17 to find the torsion r of the non-constant
pitch helix
r = cos ti
. . /e ( — e~ e \
I + S1I1/J + / jk.
526 / SCALARS, VECTORS AND FIELDS CH 11
Section 11-2
11-19 Find the antiderivative of the following two functions t(t):
(a) f(0 = cosh 2ti + - j + t 3 k;
(b) f(0 = t 2 sin ti + e ( j + log tk.
11-20 Verify the following antiderivatives using Definition 11-3:
(a) |7r . ^\ dt = Kr . r) + C = it 2 + C;
/L , f/dr d 2 r\ _, 1 dr dr ^ 1 /dr\ 2 „
(b) J(d-r-d7i) d, = 2d7-d7 + c = 2(d7) +C;
^ f d 2 r dr ^
(c) Jrx- = rx- + C,
I
where C, C are arbitrary constants.
11-21 Use the result of Problem 11-20 to express dr/dt in terms of r, given that r
satisfies the vector differential equation
d 2 r
_ + n 2 r = 0.
11-22 Evaluate the definite integral
>
(t 2 e ( i + t log t\ + t 2 k)dt.
J\
11-23 The displacement of a particle P is given in terms of the time / by
r = cos 2ti + sin 2t] + t 2 k.
If v and / are the magnitudes of the velocity and acceleration respectively,
show that
11-24 A point moving in space has acceleration cos ti + sin t\. Find the equation
of its path if it passes through the point (— 1, 0, 0) with velocity — j + k at
time / = 0.
11-25 Evaluate the line integral of F = xyi + yz\ + zk along the contour defined
by r = ti + t % ) + t s k from t = to t = 1.
11-26 Evaluate the line integral of F = 4xyi — 2x 2 \ + 3zk from the origin to the
point (2, 1, 0) along the contour:
(a) from the origin to the point (2, 0, 0) and then from the point (2, 0, 0) to
the point (2, 1,0);
(b) from the origin to the point (0, 1 , 0) and then from the point (0, 1 , 0) to
the point (2, 1,0);
(c) from the origin to the point (2, 1, 0) along the straight line joining these
two points.
(Hint : the contours (a), (b), (c) all lie in the plane z = 0.)
11-27 Evaluate the line integral F = 4xyi + 2x 2 ) + 3zk from the origin to the
point (2, 1, 0) along the contours of Problem 11-21.
PROBLEMS / 527
Section 11-3
11-28 A particle moves in a curve given by
r = a(l — cos 0) with — = 3.
At
Find the components of velocity and acceleration. Show that the velocity is
zero when 8 = 0. Find the acceleration when 8 = 0.
11-29 A particle moves on that portion of the curve r = ae e cos 6 (a = constant)
for which < 6 < £ir, so that its radial velocity u remains constant. Find
its transverse velocity and its radial and transverse components of acceleration
as functions of u and d.
11-30 If the fluid velocity v = yi + 2x\, determine the circulation y by integrating
anti-clockwise around the rectangular contour x = ±a, y = ±b. Show that
the sign of y is reversed if the direction of integration is taken clockwise
around the same contour.
11-31 Consider the three rectangular regions (a)0<*<l, — l<y<l,
(b) < x < 1, 1 < y < 2, and (c) < x < 1, - 1 < y <c 2 and denote
their boundary curves by I 1 !, r 2) and r 3 . If F = 2yi + x\, evaluate the three
line integrals
Ji = F . dr, Ji = F . dr, J 3 = f F . dr,
Jri Jr 2 Jr 3
and hence show that J\ + Ji = h.
11-32 Given that F = cosj-i + sin*j, evaluate the line integral of F taken anti-
clockwise around the triangle with vertices at the points (0, 0), (£w, 0),
(£*•> £")•
11-33 A vector field F is said to be irrotational if its line integral around any closed
curve T is zero. By integrating around two conveniently chosen contours,
deduce which of the following vector fields are irrotational :
(a) F = y sinh z\ + x sinh z\ + xy cosh zk;
(b) F = xi + y\ + zk;
(c) F = xyzH + x 2 z a \ + x 2 yzk.
Section 11-4
11-34 Find the gradient of the following functions <l>:
(a) $ = cosh xyz;
(b) ^ = x 2 +y 2 + z 2 ;
(c) $ = xy tanh (x — z).
11-35 Find the directional derivative of the following functions <£ in the direction
of the vector (i + 2j — 2k) :
(a) </> = 3x 2 + xy 2 + yz;
(b) ^ = x 2 yz + cosy;
(c) <l> = 1 1 xyz.
11-36 If new independent variables S, n, i are introduced through the equations
£ = x + a., n = y + p, and t = z + y, where a, /3, and y are constants,
and 4> is a suitably differentiable function, prove that
528 / SCALARS, VECTORS AND FIELDS CH 11
Deduce from this result the fact that grad 4> is unchanged by a translation
of the origin of the coordinates. This property is described by saying that
grad <l> is invariant with respect to a translation of the coordinate system.
11-37 If new independent variables f, rj, { are introduced through the equations
$ = aux + ai2y + avsz,
V = a2ix + az?y + a23Z,
£ = 031X + 032)> + A33Z,
and <j> is a suitably differentiable function, prove that
/. 8 . 8 , 8 \ , I. <> .8 . 3 \
Deduce from this result the fact that grad ^ is unchanged by a rotation of
the coordinate system. This property is described by saying that grad <t> is
invariant with respect to a rotation of the coordinate system.
11-38 If a is a constant vector and r = xi + y] + zk, r = | r | prove that
(a) grad (a. r) = a;
(b) gradr = r;
(c)grad^ =-~
11-39 By using the Cartesian representation of grad <£ as expressed in Definition
11-5, prove that
(a) grad (a<j> + by>) = a grad <t> + b grad y>;
(b) grad (^y>) = <f> grad y> + V> grad <f>,
where a, b are scalar constants and <j>, y> are suitably differentiable functions.
11-40 A vector field F will be irrotational if it is expressible in the form F = grad <f>,
with </> a scalar potential. Find the most general scalar potential ^ that will
give rise to the irrotational vector field
F = O + $y 2 z 2 )i + xyz 2 ) + xy 2 zk.
11-41 Find the unit normal n to the surface x 2 + 2y 2 — z 2 — 8 = at the point
(1, 2, 1). Deduce the equation of the tangent plane to the surface at this
point.
11-42 Find the unit normal n to the surface x 2 — 4y 2 + 1z 2 = 6 at the point
(2, 2, 3). Deduce the equation of the plane which has n as its normal and
which passes through the origin.
11-43 If (xo, jo, zo) is a point on the conic surface
x 2 + y 2 + z 2 =
a b c
show that the tangent plane to the surface at that point is
xxo yy zzo _ 1
a b c
11-44 The vector field F is generated by the scalar potential <j> = x 2 y. Verify
directly by integration that the line integral of F along each of the three paths
PROBLEMS / 529
of Problem 11-26 is equal to 4. Confirm this result by using the fact that if
F = grad 4, then
F . dr = -KB) - 4(A).
I
11-45 The Newtonian law of gravitation asserts that the force of attraction between
point masses m u rm distant r apart acts along the line joining them and has
magnitude (Gmi m2)lr 2 , where G is the gravitational constant. Show that
this force law corresponds to a potential 4 = (Gm\ m^lr.
11-46 If v = vii + v 2 ) + Dak is a vector field, then the scalar operator v . grad
expressed in Cartesian coordinates is defined to be
v . grad = v . V = vi — + v 2 — + v 3 -■
ox 8y 8z
Hence if F, 4> are suitably differentiate vector and scalar fields, respectively,
it follows that v . grad 4> is a scalar and v . grad F is a vector. Given that
= x 3 yz 2 ,
find
(a) v . grad 4>\
(b) v . grad F;
(c) v . grad v.
y = xyi+yj + xzk, F = xH + y*] - z 2 k,
11-47 Special differential operators called the divergence and the curl of a vector
can be defined in terms of Cartesian coordinates by means of the operator V.
If F = Fii + F 2 j + F 3 k is a suitably differentiable vector field, then the
divergence of F is denoted either by div F or V . F and is the scalar defined
by
divF^V.F = ^ + ^ + ^ 3 -
8x 8y ^ 8z
The curl of F is denoted either by curl ForVxF and is the vector defined by
k
8
curl F = V x F =
Show that
8
8x
Fi
J
8
Ty
F 2
8z
\8y 8z J T \8z 8x) l ^\8x Sy ) *'
If ^ is a suitably differentiable scalar function show by direct substitution
into the definitions that
(a) div (4>F) = F . grad 4 + 4> div F;
(b) curl (4>F) = F x grad 4> + 4- curl F.
11-48 Find V . F and V x F given that
F = x 2 y 2 i + y 2 z 2 ) + xzk.
530 / SCALARS,VECTORS.AND FIELDS CH 11
11-49 Prove from the definitions that
(a) curl grad 4> =
(b) div curl F = 0,
where <j>, F are suitably differentiable scalar and vector functions respectively.
11-50 Give an example of a differentiable scalar potential $ and vector field F.
Use them to confirm the results of Problem 11-49 by means of direct differ-
entiation.
11-51 In the slow one-dimensional flow of a viscous fluid between parallel plates
the velocity field has the form
-(-*)
k,
where the plates coincide with the planes x = ±d and the z-axis points in
the direction of flow. By selecting a convenient contour in the (x, z)-plane,
prove that the circulation is non-zero so that the flow cannot be irrotational.
Section 11-5
11-52 The velocity field describing a fluid flow is
v = 2i + yt) + k.
Write down the differential equations describing the streamlines and the
particle trajectories and solve them as in Example 11-15 in the text.
Series, Taylor's theorem
and its uses
12-1 Series
The term series denotes the sum of the members of a sequence of numbers
{a n }, in which a n represents the general term. The number of terms added may
be finite or infinite, according as the sequence used is finite or infinite in the
sense of Chapter 3. The sum to N terms of the infinite sequence {a n } is written
x
a\ + a 2 + • ■ ■ + ajv = 2 a„,
«=i
and it is called a finite series because the number of terms involved in the
summation is finite. The so called infinite series derived from the infinite
sequence {a n } by the addition of all its terms is written
00
«i + 02 + • • • + a r + ■ • • = 2 a n .
n = l
The following are specific examples of numerical series of essentially
different types:
N
I
n = \
(a) 2 »a = 12 + 22 + • • • + n\
in which the general term a n = n 2 ;
00 1 11 i
(b)2 --i + i+l + i. + . ..+! +
r\
in which the general term a n = 1/nl;
00 1 11 1
(c)2 i = i+I + - + . .. + ! + ...
n=i n I 5 r
in which the general term a n = \jn;
» 2n 2 + 1 1 9 19 , 2r2 + 1
in which the general term a n = (2n 2 + l)/(4n + 2);
(e) | (-!)»+! =1-1 + 1-1 + . .. + (_l)r+l +
K = l
532 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
in which the general term a„ = (— l) n+1 .
Only (a) is a finite series ; the remainder are infinite.
There is obviously no difficulty in assigning a sum to a finite series, but
how are we to do this in the case of an infinite series ? A practical approach
would be to attempt to approximate the infinite series by means of a finite
series comprising only its first N terms. To justify this it would be necessary
to show in some way that the sum of the remainder Rn of the series after N
terms tends to zero as N increases and, even better if possible, to obtain an
upper bound for Rn. This was, of course, the approach adopted in Chapter 6
when discussing the exponential series which comprises example (b). In the
event of an upper bound for Rn being available, this could be used to deduce
the number of terms that need be taken in order to determine the sum to
within a specified accuracy.
The spirit of this practical approach to the summation of series is exactly
what is adopted in a rigorous discussion of series. The first question to be
determined is whether or not a given series has a unique sum; the estimation
of the remainder term follows afterwards, and usually proves to be more
difficult.
To assist us in our formal discussion of series we use the already familiar
00
notion of the nth partial sum S n of the series 2 a n, which is defined to be the
n = l
finite sum
n
Sn =2 °r = fl l + «2 + ' - " + «»•
r=l
Then, in terms of S n , we have the following definition of convergence, which
is in complete agreement with the approach we have just outlined.
definition 12T (convergence of series) The series J #« will be said to be
convergent to the finite sum S if its «th partial sum S n is such that
lim S n = S.
n~*oo
If the limit of S n is not defined, or is infinite, the series will be said to be
divergent.
The remainder after n terms, R n , is given by
Rn = a n +i + a n +2 + ■ • • + a n ^ r + * ' ',
so that if {S n } converges to the limit S, then R n = S — S„ and Definition
12-1 is obviously equivalent to requiring that
SEC 12,1 SERIES / 533
lim (5" - S„) = lim R„ = 0.
n— *oo n—*co
Example 12-1 Find the «th partial sum of the series
, 111 1
1 H 1 1 1- • • • H h • • ■
3927 3»
and hence show that it converges to the sum 3/2. Find the remainder after
n terms and deduce how many terms need be summed in order to yield a
result in which the error does not exceed 001.
Solution This series is a geometric progression with initial term unity and
common ratio 1/3. Its sum to n terms, which is the desired nth partial sum
S n , may be determined by a well known formula (see Problem 12-2) which
gives
1 - (l/3)» 3
Sn - -p^f = - 2 [1 - (1/3).].
We have
3
hm S n = lim -
n-»oo n— *-oo -^
-ffl
= 3/2,
showing that the series is convergent to the sum 3/2.
As S„ is the sum to n terms, the remainder after n terms, R„, must be
given by R„ = 3/2 - S n , and so
*-i(T-
If the remainder must not exceed 001, R n < 001, from which it is easily
seen that the number n of terms needed is n > 5. The determination of R n
was simple in this instance because we were fortunate enough to have avail-
able an explicit formula for S„. In general such a formula is seldom available.
The definition of convergence has immediate consequences as regards
the addition and subtraction of series. Suppose Sa B and S6„ are convergent
series with sums a, /S. (It is customary to omit summation limits when they
are not important.) Let their respective partial sums be S n = a\ + a 2 + ■ • ■
+ an, S n ' = bi + bi + • • • + b n and consider the series S(a B + b n ) which
has the partial sum S n " = S n + S n '. Then
lim Sn" = lim (S n + S„')
n— *-co n— >oo
= lim 5„ + lim S n ' = a + /?,
534 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
showing that
00
1 (a» + b„) = x + p.
n = l
A corresponding result for the difference of two series may be proved in
similar fashion. We have established the following general result.
theorem 121 (sum and difference of convergent series) If the series
00 00
2 a n and J b n are convergent to the respective sums « and /S, then
n=\ n=l
00 00
2 ( a » + b n ) = « + ft; 2 ( a » — An) = a - /S.
n=l n=l
Example 12-2 Suppose that a n = (l/2)» and 6„ = (1/3)", so that the
00
series involved are again geometric progressions with 2 0/2) n = 2 and
n = l
CO
2 (l/3)» = 3/2. Then it follows from Theorem 12-1 that
n = \
CO 00
2 [(1/2)- + (1/3)"] = 7/2 and 2 [(1/2)" - (1/3)"] = 1/2.
Let us now derive a number of standard tests by which the convergence
or divergence of a series may be established. We begin with a test for
divergence.
Suppose first that a series Sa„ with «th partial sum S n converges to the
sum S. Then from our discussion of the convergence of a sequence given in
Chapter 3, we know that for any e > there must exist some integer N such
that
| S n - S | < e for n > N.
This immediately implies the additional result
1 S n +i - S | < e.
Hence,
e + e > | S»+i - S | + | S n - S \ = \ S n +i - S \ + \ S - S„ |
> | Sn+l — S n |.
However, as S n +i — S n = a n +i, we have proved that
I tfn+1 | < 2e for n > N.
As e was arbitrary, this shows that for a series to be convergent, it is necessary
that
lim | a n | =
n~>oo
SEC 12-1 SERIES / 535
or, equivalently,
lim a„ = 0.
n— *co
GO
If this is not the case then the series J «« must diverge. This condition thus
n = \
provides us with a positive test for divergence.
00
theorem 12-2(a) (test for divergence) The series £ a„ diverges if
lim a n ^ 0.
This theorem shows, for example, that the series (d) is divergent, because
a n = {In 2 + l)/(4« + 2), and hence it increases without bound as n increases.
It is important to take note of the fact that this theorem gives no information
in the event that lim a n = 0. Although we have shown that this is a necessary
JI-+00
condition for convergence, it is not a sufficient condition because divergent
series exist for which the condition is true.
Theorem 122(a) gives no information about either series (a) or (c) as in
each case lim a n = 0. In fact, by using another argument, we have already
n-*oo
proved that the series representation for e in (b) is convergent, whereas we
shall prove shortly that the harmonic series (c) is divergent. Series (e) must
also be divergent according to our definition, because a n oscillates finitely
between 1 and — 1, and also S n does not tend to any limit.
The terms of series are not always of the same sign, and so it is useful to
associate with the series Za„ the companion series S | a„ |. If this latter
series is convergent, then the series 2a„ is said to be absolutely convergent. It
can happen that although 2a„ is convergent, S [ a„ | is divergent. When this
occurs the series £a» is said to be conditionally convergent. Now when terms
of differing signs are involved, the sum of the absolute values of the terms of a
series clearly exceeds the sum of the terms of the series, and so it seems reason-
able to expect that absolute convergence implies convergence. Let us prove
this fact.
theorem 12-2(b) (absolute convergence implies convergence) If the series
00 _ to
2 | an | is convergent, then so also is the series T a n .
" =1 n=l
Proof The proof of this result is simple. Let S n = | a x | + | a 2 | + • • •
+ I a n | and S n ' = a\ + « 2 + • • • + a n be the «th partial sums, respectively,
of the series in Theorem 12-2. Then, as a r + | a r \ is either zero or 2 | a r \, it
follows that
< S n + S n ' < 2S n '.
536 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
Now by supposition Iim S n ' = S' exists, so that taking limits we arrive at
n— «-oo
< lim (S n + S n ') < 25'.
n— *-oo
This implies that the series with «th term a n + \a n \ must be convergent and
hence, using Theorem 12-1, that ]T a n must be convergent.
Example 12-3 Consider the series
»r «! 2! 3! ^
As a» = (—l) n jnl, we have | a„ | = l/«!, which is the general term of the
exponential series denning e. Thus Theorem 12-2, and the convergence of
the exponential series, together imply the convergence of 2 ( — I)*/"! 1°
M =
fact this is the series representation of 1/e.
Suppose 26» is a convergent series of positive terms, and that Sa» is a
series with the property that if N is some positive integer, then | a n | < b n
for n > N. Then clearly the convergence of S6« implies the convergence of
2 I fl » I an d, by Theorem 12-2, also the convergence of 2a w . By a similar
argument, if for n > N, < b n < a„, and S6 n is known to be divergent,
then clearly 2a ra must also be divergent. We incorporate these results into a
useful comparison test.
theorem 12-3 (comparison test)
(a) Convergence test Let S6« be a convergent series of positive terms, and
let 2a„ be a series with the property that there exists a positive integer N
such that
| an I < b n for n > N.
Then Sa„ is an absolutely convergent series.
(b) Divergence test Let Sft„ be a divergent series of positive terms, and let
Sa n be a series of positive terms with the property that there exists a positive
integer N such that
< b„<a n for n > N.
Then 2a„ is a divergent series.
Example 12-4
iider t
2 + (-l)» 3
2» 2 n '
(a) Consider the series 2 [2 + (— l) M ]/2». We have
fln
SEC 12-1
SERIES / 537
and as £ 3/2» = 3 J 1/2" = 9/2, the conditions of Theorem 12-3 (a) are
w = 1 n = 1
satisfied if we set 6„ = 3/2". It thus follows that the series Sa M is convergent.
CO
(b) Consider the series ]T (n + l)/« 2 . Here we have
n+\ 1 /«+ 1\ 1
a„ = — — = - > -,
n 2 n\ n J n
and as the harmonic series Sl//j is divergent, the conditions of Theorem
12-3 (b) are satisfied when we set b„ = \jn. Hence Sa„ is divergent.
n - 1 « x'
O I 2 3 4 n -1 n x* d ~~~~™~
(a) (b)
Fig. 121 Comparison between series and integral.
A powerful test for the convergence or divergence of a series Sa re of
positive terms follows by a comparison of the shaded rectangles in Fig. 12.1.
Let f(x) be a non-increasing function denned for 1 < .v < oo which
decreases to zero as x tends to infinity, and let/(«) = a n , where n is an integer.
Then we have the obvious inequality
n r*n n
2/(r)< f(x)dx<Zf(r)
or, equivalently,
n rn n
1 a r < f(x) dx<2 a r .
r = 2 Jl
r = \
As the right-hand side of this inequality only exceeds the left-hand side by the
single term a\, it must follow that in the limit, the infinite series Sa r and the
integral
lim f{x
K— ►OO Jl
:) dx
538 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
converge or diverge together. This conclusion may be incorporated into a
test as follows.
theorem 12-4 (integral test) Let/(x) be a positive non-increasing function
defined on 1 < x < oo with lim/(.v) = 0. Then, if a n =/(«), the series
X— »-co
OO
2 (tn converges or diverges according as
n = l
1
\x) dx
is finite or infinite.
Corollary 12-4 (R N deduced from integral test). Letf(x) be a positive non-
ce
increasing function defined on 1 < x < oo with \imf(x) = 0, and let J ««
n~*ou n~ 1
be convergent, where a n =f(n). Then the remainder R N after N terms
satisfies the inequality
JN
Rn< f(x)dx.
JN
Proof The result follows at once from the obvious inequality
y' ry' y'
y t ry y
X a r< \ f(x)dx<^a r
= * + l JN r = N
by taking the limit as N' —*■ oo. This is possible because, by hypothesis,
Sa ra is convergent so that the improper integral involved exists.
Example 12-5
OO
(a) Consider the series 2 !/«*> where k > 0. Then the function /(x) = 1/jc*
n = l
satisfies the conditions of Theorem 12-4. Hence this series converges or
diverges according as
lim r*
n-»co Jl X k
dx
k
is finite or infinite. If k =^= 1 we have
lim — = I- -I lim
n— *-°o Jl -^ \A ^/ n-+co
1
nk—1
Hence for < k < 1 this limit is infinite, showing that the series is divergent
for k in this range, whereas for k > 1 this limit has the finite value l/(k — 1),
showing that the series is convergent for k > 1. Applying Corollary 12-4
SEC 12-1 SERIES / 539
shows that when k > 1, the remainder ^ after N terms must satisfy the
inequality
Rx<N^- k ^{k - 1).
When fe = 1 we obtain the harmonic series, which must be treated
separately. As it follows that
lim — = lim log n — >- oo,
ft-* CO Jl X ft-* CO
we have proved that the harmonic series is divergent.
(b) Consider the series £«/(l + n 2 ). Here we set/(x) = x/(l + x 2 ), so
M = l
we must examine
rn
= lim
ft-* oo Jl
xdx
1 + x 2
Setting x 2 = h we find
Z, = lim Mlog (1 + x 2 ) - log 2] -> oo.
ft— *00
Hence the series is divergent.
Two other useful tests known as the ratio test and the «th root test may
be derived from Theorem 12-3, essentially using a geometric progression for
purposes of comparison. The idea involved in these tests is that a series is
tested against itself, and that its convergence or divergence is then deduced
from the rate at which successive terms decrease or increase.
Suppose that Ea„ is a series for which the ratio a n+1 la n is always defined
and that lim | a n +ila„ | = L, where L < 1 . Let r be some fixed number
such that L < r < 1. Then the existence of the limit L implies that there
exists an integer A^ such that
I Oft+i | < r ] a n | for n >.N.
Hence it follows that
I a N +2 | < r | a N+ i |, | a N+3 \ <r\ a N+2 \ < r 2 \ a N+1 ],...,
and in general
[ «A-+m+i | < r m I a N+ i \.
Thus if R N is the remainder after N terms we have
GO CO
Rn = J, a n < ^ I a n | < | oa-+i I (1 + r + r 2 + • • •). (*)
«=jV+1 «. = J\ t +1 v '
540 / SERIES, TAYLOR'S THEOREM AND ITS USES
CH 12
The expression in brackets is a convergent geometric progression because,
by hypothesis, r < 1 . As the remainder term Rn is finite, and is less than the
sum of the absolute values of the terms comprising the tail of the series, it is
easily seen that the series 2a« must be absolutely convergent. If L > 1 the
terms grow in size, and the series Sa« is divergent. Nothing may be deduced
if L = 1 for then the series may either be convergent or divergent as illus-
trated by Example 12-5 (a). In that case a n +ila„ = n k J(n + l) k , giving
lim | a n +ila n | = 1 ; and the series was seen to be divergent for < k < 1
and convergent for k > 1.
Expressed formally, as follows, these results are called the ratio test.
CO
theorem 12-5 (ratio test) If the series ^ o n is such that a n ^ and
Cln+l
lim
«-*00
a n
= L,
then
(a) the series Sa w converges absolutely if L < 1,
(b) the series 2a„ diverges if L > 1 ,
(c) the test fails if L = 1.
Example 12-6
(a) Consider the series
Then a n ¥= and
2=2 = r_i )2 » + i <" + W""
a„ K ' (n + l)»+i,
!» / 1\
_ = ( _ 1)2 „ +1 ( 1+ -J
l\-»
Hence
lim
n-*co
tf»+l
-, , r„ 1 /( 1+ ;)"'"' 5 -
where the final result follows by virtue of the work of Section 3-3. As e > 1,
the ratio test proves the absolute convergence of this series.
(b) Consider the series 2 l/ w - Here a n = l/«! =^=0 and
n = l
ni
1
dn+l
a n («+!)! n + 1
Ctn+l
On
SEC 12-1
SERIES / 541
Hence
lim
On+1
= lim —
n—a> n + 1
= 0,
and as < 1 the ratio test proves the series to be convergent.
00
(c) Consider the series 2 3"/w.
n = l
Then a n ^ and
flit+i = / 3"+* \/n\ _ / n \ _
a n \ n+ [j\3n)- \„+\)~
tf»+l
a n
Now
lim
n— *oo
0«+l
On
= lim
3n
„-*«, n + 1
= 3,
and as 3 > 1 the ratio test proves the series to be divergent.
(d) Consider the series J l/(2« + l) 2 .
«=i
Then n,^0 and
gn+l
Now
lim
n— *oo
= / 2 "+ 1 N i 2
\2n + 3/ ~
tf»+i
«n
tf»+i
On
n-.oo \2n + 3/
so that the ratio test fails in this case. In fact the series is convergent, as may
readily be proved by use either of the comparison test, with b n = Ijn 2 , or
the integral test.
As the remainder term R N used in the proof of the ratio test may be either
positive or negative, the estimate (*) is equivalent to
I Rn I < | a N+1 \(l+ r + r*+- • •)
or, summing the geometric progression, to
Rn\<-
1 -r
This simple result provides an estimate of the error if the summation is
terminated after N terms and comprises our next result.
Corollary 12-5 (R N deduced from ratio test) Let the series J* a„ be con-
vergent, and let the ratio test be applicable with " =1
542 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
lim
n->oo
tf«+l
a n
= L.
Then, if r is a number such that L < r < 1, the remainder R N after N terms
is such that
\R»\< laN+l1
1 -r
Let us use Example 12-6 (a) to illustrate this and to compute \ Rs\. We
have L = 1/e and, as e = 2-7182 . . ., we could take r = 0-5. Then 1/(1 - r)
= 2, whence
, 2NI
Hence \Rs\< 48/625.
In the so called nth root test, appeal is also made to the geometric pro-
gression to prove convergence. Suppose the series £#« is such that
lim VI an\=L,
and that L <\. Then if r is some definite number such that L < r < 1, the
existence of the limit implies that there exists an integer N such that
"VI a„\ < r for n> N.
Hence | a n \ < r n for n > JV. Thus, as with the ratio test, the remainder
after N terms may be overestimated by the sum of the absolute values of the
remaining terms, and the result still further overestimated in terms of
| fljv-n | and a geometric progression with common ratio r. As r < 1 this re-
mainder is finite, thereby establishing that 2a„ is absolutely convergent. If
L > 1, then successive terms grow and the series is divergent. As with the
ratio test, the nth root test fails when L = 1, for then Sa„ may be either
convergent or divergent. Stated formally we have :
00
theorem 12-6 («th root test) If the series J a n is such that
lim "VI a n \=L,
then
(a) the series 2a„ is absolutely convergent if L < 1,
(b) the series 2a„ is divergent if L > 1 ,
(c) the test fails if L = 1.
Example 12-7
(a) Consider the series
SEC 12 ' 1 SERIES / 543
nk \ n
V / nk V
n =i \3n + 1/
where k is a constant. Then
/ nk \ n
°n — I I and lim » VI «» I = h'm
nk k
3«+ 1 3
Thus the nth root test shows that the series will be convergent if k < 3 and
divergent if k > 3. It fails if k = 3, though Theorem 12-2 then shows the
series to be divergent.
00
(b) Consider the series £ nj2 n .
»=i
Then a„ = n\l n = | a n |, and "VI «» 1 = i B V«- Taking logarithms we find
log [VK |] = log | + - log w.
n
Now by Theorem 6-4 (b) we know that lim (log n)\n = 0, so that
«->co
lim log [VI a» |] = log 1,
whence
lim V" = s-
n-*co
As | < 1 the test thus proves convergence. In this instance it would have been
simpler to use the ratio test to prove convergence.
If Sa„ is convergent by the «th root test, then we have seen that a number
N exists such that | a n | < r n for n > N, where < r < 1. Hence we have
«> co oo jv+1
Rn= 2 a n <\R N \< ^ |a»l< X rn = - '
n = N + l n = N + l n = N + l I — T
and so
Rn\<
r
■JV+l
1 -r
We express this overestimate of the remainder term as a corollary to the nth
root test.
CO
Corollary 12-6 (R N deduced from nth root test) Let X a n be convergent by
the nth root test with n=1
lim VI a n | = L.
544 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
Then if r is a number such that L < r < 1, the remainder Rn after N terms
is such that
r
Rn\<
■jv+i
1 - r
We illustrate this result by obtaining an estimate for the remainder after
three terms of the convergent series of Example 12-7 (b). In that case L = \
so that we must choose r such that \ < r < 1 . If we select r = f , then
Rz\<
8/5\ 4 _ 625_
3\8/ ~ 1536'
Had r been chosen closer to the value \, then a sharper estimate would have
been obtained. Thus by taking r = 9/16 it follows that
, „ , 6561
*3 <
28672
For our final result we prove that all series in which the signs of terms
alternate, whilst the absolute values of successive terms decrease monotonic-
ally to zero, are convergent. Such series are called alternating series and are
of the general form
00
2 (-l) n+1 a n = ai - 02 + as - 04 + • • •,
n = l
where a n > for all n.
To prove our assertion of convergence we assume «i > a% > az > • • ;
and lim a n = and first consider the partial sum SW corresponding to an
even number of terms 2r. We write Sir in the form
S 2r = (ai — a 2 ) + {az — at) + • • • + (a 2r -i — air).
Then, because fli > ai > az > • ■ ; it follows that SW > 0. By a slight
rearrangement of the brackets we also have
Sir = a\ — (02 — 03) — («4 — O5) — • ■ • — («2r-2 — «2r-l) — «2r,
showing that as all the brackets and quantities are positive, S 2r < fli- Hence,
as iS , 2r is a bounded monotonic decreasing sequence, we know from Chapter
3 that it must tend to a limit S, where
< S < ai.
Next consider the partial sum SW+i corresponding to an odd number of
terms 2r + 1. We may write S% r +i = S 2 r + a 2r +i- Then, taking the limit of
S 2 r+i we have
lim Szr+i = lim S 2r + lim a 2r +i = S,
SEC 12-1 SERIES / 545
because by supposition lim a 2r +i = 0. Thus both the partial sums 5 2r and
the partial sums S 2r +i tend to the same limit S. Hence we have proved that
for n both even and odd
lim S n = S,
n—-co
thereby showing that the series converges.
CO
theorem 12-7 (alternating series test) The series 2 (— l) n+1 a n converges
n = l
if a„ > and a n +i < a n for all n and, in addition,
lim a n = 0.
Example 12-8
(a) Consider the alternating series
i<-=!)?_ 1 _I + _L__L + ...
n ~i 2» 2 2* 23
in which the absolute value of the general term a n = $». Then, as it is true that
«n+i < a n and lim a n = 0, .the test shows that the series is convergent,
(b) Consider the alternating series
00
2 (-1)" +1 2 ""V2 = a/2 - V2 + V2 - V2 + • • -,
«=i
in which the absolute value of the general term a n = n+1 s/2. Now it is true that
a n +\ < a n , but lima« = 1, so that the last condition of the theorem is
violated rendering it inapplicable. Theorem 12-2 shows the series to be
divergent.
The form of argument that was used to show < S 2r < ai also shows
that
0< | (-l)»+ia r <a 2m+1
and, by a slight modification, that
00
-tf2m< 2 (-l)" +1 a r <0.
r=2m
00
As R 2m = 2 ( — ^) ra r is the remainder after an even number 2m of terms,
r = 2m + l
and Rzm-i = 2 (~ 1 ) rfl >- is tne remainder after an odd number 2m — 1 of
r=2m
terms, it follows that if N is either even or odd, then
< | R N | < a N+1 .
546 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
Expressed in words this asserts that when an alternating series is termi-
nated after the Nth term, the absolute value of the error involved is less than
the magnitude a^+i of the next term.
CO
Corollary 12-7 (R N for alternating series) If the alternating series £ (— l) n+1 a n
converges, and R N is the remainder after N terms, then
< | R N | < a N+1 .
Using the convergent alternating series in Example 12-8 (a) for purposes
of illustration we see that a„ = 1/2", and so the remainder Rn must be such
that
< | R N | < l/2- zv + 1 .
For example, termination of the summation of this series after five terms
would result in an error whose absolute magnitude is less than 1/64.
A calculation involving the summation of a finite number of terms is
often facilitated by grouping and interchanging their order. Although these
operations are legitimate when the number of terms involved is finite, we
must question their validity when dealing with an infinite number of terms.
Later we shall show that the grouping of terms is permissible for any conver-
gent series, but that rearrangement of terms is only permissible in a series
when it is absolutely convergent, for only then does this operation leave the
sum unaltered.
An example will help here to indicate the dangers of manipulating a series
without first questioning the legitimacy of the operations to be performed
upon it. Consider the alternating series
1 2T3 — 4-I-5— 6"T ,
which is seen to be convergent by virtue of our last theorem, and denote its
sum by S. Then we have
s = 1 - i + 4 - i + 1 - * + I - i + i - h + h - h + ■ ■ ■
or, on rearranging the terms,
C 1 _l_l_Ll_l_l_|_l 1- -1- _1_ . . .
= (i - I) - 1 + (i - i) - i + (i - -A-) - A + • • •
= 45.
This can only be true if S = 0, but clearly this is impossible because
Corollary 12-7 above shows that the error in the summation after only one
term is less than \ and therefore S is certainly positive with \ < S < 1 .
What has gone wrong. The answer is that in a sense we are 'robbing Peter
to pay Paul'. This occurs because both the series 21/(2« + 1) and the series
SEC 12 ' 1 SERIES / 547
21/2/j from which are derived the positive and negative terms in our series are
divergent, and we have so rearranged the terms that they are weighted in
favour of the negative ones. Other rearrangements could in fact be made to
yield any sum that was desired. In other words, we are working with a series
that is only conditionally convergent, and not absolutely convergent. It
would seem from this that perhaps if a series 2a„ is absolutely convergent,
then its terms should be capable of rearrangement and grouping without
altering the sum. Let us prove the truth of this conjecture, but first we prove
the simpler result that the grouping or bracketing of the terms of a convergent
series leaves its sum unaltered.
Suppose that 2 fl „ is a convergent series with sum S. Take as representative
of the possible groupings of its terms the series derived from 2a„ by the
insertion of parentheses (brackets) as indicated below:
Oi + a 2 ) + (a s + 04 + a 5 ) + a 6 + (a 7 + a 8 ) + ■ • ■.
Now denote the bracketed terms by b u b 2 , . . ., where b± = ai + a 2 ,
bi = a 3 + 04 + a 5 , . . ., so that we have associated a new series ~Zb n with
the original series 2a„. If the nth partial sums of 2a„ and U„ are S n and S' n ,
respectively, then the partial sums S' u 5" 2 , 5" 3 , S" 4 , ... of 2Z>„ are, in
reality, the partial sums S 2 , S 5 , S 6 , S s , . . . of 2a„. As 2a„ is convergent to S
by hypothesis, any subsequence of its partial sums {S„} must also converge
to S. In particular this applies to the sequence S z , S 5 , S 6 , S s , . . ., derived by
the inclusion of parentheses. Hence 26„ is also convergent to the sum S,
which proves our result.
We now examine the effect of rearranging the terms of a series. Let 2 a „
be absolutely convergent so that 2 | a n \ must be convergent, and let 26„ be
a rearrangement of 2a„. Then, as the terms of 2 | b n \ are in one-to-one
correspondence with those of 2 | a n |, it is clear that 2 [ b n \ = 2 | a n |,
from which we deduce that T,b„ is also absolutely convergent.
Next we must show that 2a„ and 26„ have the same sum. If S n is the nth
partial sum of 2a n which has the sum S, then by taking n sufficiently large we
may make | S n — S | as small as we wish; say less than an arbitrarily small
positive number e. Now let S' m be the mth partial sum of U„. Then, as S n
contains the first n terms of 2a„, with their suffixes in sequential order, by
taking m large enough we can obviously make S' m contain all the terms of S„
together with m - n additional terms a p , a q , . . ., a r , where n<p<q<
■ • • < r.
Hence we may write
S' m ■=■ S n + cip + a q + • • • + a r ,
whence
S m — S = S n — S + a v + a q + • • ■ + a r .
Taking absolute values gives
548 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
| S' m - S | < | Sn - S | + | a v | + | a q \ + ■ ■ ■ + | a r |.
Now, n was chosen such that | S n — S \ < e, so that
| S' m - S | < e + | a v | + | a q \ + • • ■ + \a r \.
However, the remaining terms on the right-hand side of this inequality all
occur after a n in the series £a„, and as | S n - S \ < e, it must follow that
their total contribution cannot exceed e, and thus
| S' m - S | < 2e.
This shows that the wth partial sum of S6« converges to the sum S, so that
rearrangement of the terms of an absolutely convergent series is permissible
and does not affect its sum.
00
theorem 12-8 (grouping and rearrangement of series) If the series ^a n
M = l
is convergent, then parentheses may be inserted into the series without affect-
CO
ine its sum. If, in addition, the series 2 a n is absolutely convergent, then its
terms may be rearranged without altering its sum.
Example 12-9
(a) Consider the series
2 1 .
m =i m{m + 1)
which is easily seen to be absolutely convergent by use of the comparison
test with b m = 1/w 2 . As absolute convergence obviously implies convergence,
the first part of Theorem 12-8 asserts that we may group terms by inserting
parentheses as we wish. So, using the identity
1 1 1
m{m +1) m m + 1
we find for the «th partial sum S n the expression
s,_5(! ' ).
M = i\/w m + 1/
Now successive terms in this summation cancel, or telescope as the process
is sometimes called, leaving only the first and the last. This is best seen by
writing out the expression for S n in full as follows :
*-G-iK-i)— ♦ (^i-i) + (i-dn)
1
SEC 12-1 SERIES / 549
Hence, if the sum of the series is S, we have
= 1.
S = lim S„ = lim [l —
(b) Consider the series
2 3 2 2 3 2 2 3 3 3 '
which can be shown to be absolutely convergent by an extension of the «th
root test. (See Problem 12-14.) The second part of Theorem 12-8 is applicable,
so that we may rearrange terms and, denoting the sum by 5, we obtain
00 1 00 1
s = I z- + 2 -
1 _ 1
+ r = 7/2.
The use of parentheses in a divergent series can sometimes produce a
convergent series and, conversely, when attempting to alter the form of
a convergent series a divergent series may sometimes be produced
inadvertently.
For instance, taking Example 12-9 (b), we could have written
y —!— = y( r !JiA- , L±l)
= y " + l _ y n + 2
~ i n r »+ 1
= 2 + f«_±i_|«_±i = 2 ,
2 « 2 «
which we know to be an incorrect result. The error is, of course, contained in
the first line in which we attempt to equate an absolutely convergent series
with the difference between two divergent series.
12-2 Power series
Up to now we have been concerned entirely with series that did not contain
the variable x. A more general type of series called a power series in (x - x )
has the general form
00
2 a n {x - x ) n = a + ai(x - x ) + a 2 (x - x ) 2 + • • ; (12-1)
in which the coefficients a , a u . . .,a n ,. . . are constants. When x is assigned
some fixed value f, say, the power series Eqn (12-1) reduces to an ordinary
550 / SERIES, TAYLOR'S THEOREM AND ITS USES
CH 12
series of the kind discussed in the previous section, and so may be tested for
convergence by any appropriate test mentioned there.
For simplicity we now apply the ratio test to series Eqn (12-1), allowing x
to remain a free variable, in order to try to deduce the interval for x in which
the series is absolutely convergent. If a»(x) is the absolute value of the ratio
of the (n + l)th term to the nth term as a function of x, we have
(*) =
a„+i(x — x ) B+1
a n (x — x ) m
Gn+l
a n
X — Xo
Now for any specific value of x, the ratio test asserts that the series will
be convergent if lim oc»(x) < 1, whence we must require
lim
n-*-oo
a»+i
dn
x — xt> I < 1.
Thus the largest value r, say, of ) x — x \ for which this is true is given by
r = lim
n-*co
a n
tfn+1
provided that this limit exists.
The inequality
I x — Xo I < r
(12-2)
(12-3)
thus defines the x-interval {x — r, x + r) within which the power series
Eqn (12-1) is absolutely convergent. For x outside this interval the ratio test
shows that the power series must be divergent. (See Fig. 12-2.) The interval
itself is called the interval of convergence of the power series, and the number
r is called the radius of convergence of the power series. The interval of con-
vergence has been deliberately displayed in the form of an open interval
because the ratio test can offer no information about the behaviour of the
series at the end points. In fact the power series may either be convergent or
divergent at these points.
Divergent
Absolutely convergent
Divergent
xo + r
Fig. 12-2 Interval of convergence.
The radius of convergence of a power series can also be deduced from the
nth root test, when it is easily seen that
1
r = lim ,,
(12-4)
provided that this limit exists.
SEC 12-2
POWER SERIES / 551
definition 12-2 (radius of convergence of power series) The radius of
GO
convergence r of the power series £ a„(x - x ) n is denned either as:
«=o
r = lim
n->co
a n
Qn+l
or
r = lim
„^oo M V| a n |
provided that these limits exist.
Example 12 10
(a) Let us show that the series for the exponential function is absolutely
convergent for all real x. We have
x n
2! 3! «!
in which the general term a n = l/«!.
Now
+
an
Ctn+1
(n + 1)!
nl
= (« + 1),
so that
r = lim (n + 1) ->- oo.
?l— *-00
We have thus proved that the power series for e* is absolutely convergent for
all real x. This was an example of a power series with an infinite radius of
convergence.
(b) Consider the series
x 2 x 3 x 4
x 1 u- • •
2^3 4 ^
which reduces to the illustrative example following Corollary 12-7, when
x = 1. We shall see later that this is the power series expansion of log (1 + x).
Then, again applying limit (12-2), we have a n = (-l)»+i/«, and so
On
a n +i
Thus we have
=m
552 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
.«Iim(l±i\-
I.
Hence, the series is absolutely convergent for | jc | < 1. As we already know
the series is convergent for x = 1, and divergent for x = —1 for then it
becomes the harmonic series with the signs of all terms reversed, we have
proved that the power series for log (1 + x) is absolutely convergent for
— 1 <x< 1. This was an example of a power series with radius of
convergence unity.
(c) Consider the series
1 + x + (2a:) 2 + (3x) 3 + • • • + (iw)» + ■ • ",
then a n — n n so that
1 1
VI a n \ n
Hence, from Eqn (12-4),
r = lim - = 0.
This series has zero radius of convergence and so is absolutely convergent
only when x = 0. That is to say this power series has a finite sum, and so is
convergent, only at the one point x = on the real line.
As a power series is yet another example of the representation of a func-
tion of the variable x, it is reasonable to enquire how we may differentiate
and integrate functions that are so defined. For simplicity we will take xo = 0,
and work with the power series about the origin
00
fix) = 2 a«x». (12-5)
M =
This is no restriction because Eqn (12-1) can be brought into this form by
shifting the origin by means of the change of variable t = x — xo. We will
assume that Eqn (12-5) has a radius of convergence r > 0.
Intuition suggests that the derivative of/(x) could be obtained by differ-
entiating the right-hand side of Eqn (12-5) term by term and, similarly, that
r
Jo
fit)dt could be obtained by term by term integration. However, extreme
'o
caution must be exercised in such matters for we have already seen that what
is legitimate for the sum of a finite number of terms is not necessarily legiti-
mate for an infinite series. Furthermore, we are now dealing with an infinite
series of functions, and not just an ordinary series. In fact we shall show that
termwise differentiation and integration of a power series is always per-
missible when x lies within the interval of convergence — r < x < r of Eqn
(12-5).
SEC 12-2 POWER SERIES / 553
The justification of termwise differentiation that we now offer is perhaps
the most subtle and difficult proof to be found in this book. It has been in-
cluded because differentiation of functions defined by a power series is
fundamental to many branches of mathematics. In fact we have already
employed termwise differentiation when deriving the series representation for
e x in Chapter 6, and we shall use it again when discussing differential equa-
tions. The proof of this result also serves to indicate how any study of the
subject beyond this level must, of necessity, involve the notion of uniform
convergence. This aspect of the proof is not emphasized here, since it is
beyond the scope of a first account.
Our object will be to prove that the function
oo
F(x)=J t na n x n - 1 (12-6)
n = \
is the derivative of the function /(x) of Eqn (12-5), that is to say that/'(x) =
Fix).
First notice that Eqns (12-5) and (12-6) have the same radius of
convergence. This follows because, by hypothesis,
a
n
lim
n— *oo I Qn+1
= r,
and the ratio of the wth to the (m + l)th coefficient of Eqn (12-6) is
ma m j{m + \)a m +\,
whence
lim
m-*co
ma m
(m + l)a m+ i
= lim I J . lim
m— * oo \ W -f- 1 / m— * co
a m
Qm+l
Next, if x and x + h are points in the interval of convergence, form the
difference quotient
fix + h)- f(x) _ - /( X + h)» - x»\
h - h an \ i r (12 ' 7)
The grouping of terms on the right-hand side is permissible because of the
absolute convergence of the power series for/(x) in — r < x < r.
Then, applying the mean value theorem for derivatives (Theorem 5T2) to
the general term on the right-hand side of Eqn (12-7), we have
(x + /j)» - x n = hn£ n "-\
where x < g„ < x + h for n = 1, 2, . . .. Thus we arrive at the result
fix + h) -fix) »
. J = I na n $ n «-\ (12-8)
It n = l
554 / SERIES, TAYLOR'S THEOREM AND ITS USES
CH 12
Then, as Eqns (12-5) and (12-6) have the same radius of convergence, we may
consider the difference between Eqns (12-6) and (12-8), again using the fact
that absolute convergence permits rearrangement of terms to give
F(x)
f(x + h)-f(x)
h
= 2 na n (x«- 1 - tn^ 1 ),
n = 2
or
F(x)-
f(x + h)-f(x)
In
n = 2
a n \\x K
- 1 - fn"" 1
Let us again use the mean value theorem for derivatives to obtain the result
X"- 1 - in"- 1 = (n- IX* - tnMn n ^,
where x < rj„ <£». Then, as | x — |» \ < \ h \, we have
f(x + h)-f(x)\
F(x)
< I h | 2 n | a n | »?„ B - 2 ,
(12-9)
for —r<x< r.
Now the form of argument used to prove that the power series Eqn (12-6)
has radius of convergence r, also proves that the series on the right-hand side
of this inequality has radius of convergence r. So, allowing h to tend to zero,
as the sum of the series is finite the right-hand side of Eqn (12-9) also tends
to zero whilst the difference quotient approaches/'(*)- Hence we have proved
our result. The difficult part of this proof was in showing that the right-hand
side of Eqn (12-9) can be made arbitrarily small independently of x in the
interval of convergence. This is the property of uniform convergence
mentioned in Chapter 3.
As differentiability implies continuity we have, as an incidental result,
proved that a power series is continuous within its interval of convergence.
A more direct proof is indicated in Problem 1219 at the end of the chapter.
The termwise integrability of power series is easier to prove. Denote by
H(x) the series
H(x) = 2
a n
i-n+l
(1210)
'o n + 1
which is obtained by termwise integration of Eqn (12-5). That is
H(x) = f 7(0 dt.
Jo
Now the ratio of the «th to the (n + l)th coefficients of Eqn (12T0) is
(« + \)a n -\\na n , whence
lim
n— i-co
(n + 1) a n -\
n a n
= lim I I lim
a-n-l
a n
SEC 1 2'2 POWER SERIES / 555
This shows that the power series Eqn (12-10) also has radius of convergence
r. We have just established that a power series is differentiable for x within
its interval of convergence, so that H'{x) =f(x) for — r < x < r. Thus by
the fundamental theorem of calculus
/;
f(t)dt = H(x) - H(0) = H(x),
which was to be proved. Let us collect together these results into the form of
a theorem.
theorem 12-9 (differentiation and integration of power series) Let the
function /(x) be defined by the power series
GO
f{x) = 2 a n x\
n — O
with radius of convergence r > 0. Then, within the common interval of
convergence — r < x < r,
(a) f(x) is a continuous function ;
00
(*>)/'(*) = 2 ««»*«-i;
J % X CO
fiOdt = 2
B = l
n = n + 1
x"
Example 12-11 Find the radius and interval of convergence of
°0 v»
/(*)=2 T^-TT-
„^i n(n + 1)
Deduce/'(^) and find its interval of convergence.
Solution The «th coefficient a„ of the power series f orf(x) is a n = \]n(n + 1),
and so the radius of convergence r is given by
r = lim
«n+l
= lim
n— i-co
n + 2
= 1.
To specify the complete interval of convergence it remains to examine the
behaviour of the power series at the end points of the interval — 1 < x < 1.
The series may be seen to be convergent at x = 1 by using the comparison
test with b n = l/«2. when x = - 1 the series becomes an alternating series
and is seen to be convergent by Theorem 12-7. Thus the complete interval of
convergence for/(x) is — 1 < x < 1.
Under the conditions of Theorem 12-9 (b) we may differentiate the power
series for/(x) term by term within — 1 < x < 1, so that
556 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
00 Y «-l
/"(*) = 2
n = l
n+ 1
To specify the complete interval of convergence for this new series which, by
Theorem 12-9 (b), is certainly convergent in — 1 < x < 1, we must again
examine the end points of the interval —1 < x < 1. The series for f'(x)
becomes an alternating series when x = — 1, and is convergent by Theorem
12-7. At x = 1 it becomes the harmonic series, and so is divergent. The com-
plete interval of convergence for f'(x) is thus — 1 < x < 1 . The effect of
termwise differentiation has been to produce divergence of the differentiated
series at the right-hand end point of an interval of convergence at which
f(x) is convergent.
Example 1212 Find the power series representation of arctan x by
considering the integral
dt
arctan x
-fr
+ t 2
Deduce a series expansion for \n.
Solution An application of the Binomial Theorem to the function (1 + a) -1
gives the result
= 1 - a + a 2 - a 3 + a 4 - • • •,
1 + a
for —1 < a < 1. Setting a = t 2 we arrive at the power series representation
Of (1 + **)-!,
1 = 1 - /2 + ,4 _ ,6 + ,8 _ . . ._ ( A )
1 + r 2
The conditions of Theorem 12-9 (c) apply, and we may integrate this power
series term by term to obtain
r x dt C x
arctan x = = (1 - t 2 + f 4 - t 6 + t s - • • -)dt
Jo 1 + ? 2 Jo
or,
v3 \-5 v-7
arctan x = x — r+-r ^ + ' ' '• (B)
This is the desired power series for arctan x and by the conditions of Theorem
12-9 (b) it is certainly convergent within the interval —1 < x < 1, which is
the interval of convergence of the original power series Eqn (A).
At each of the end points x = ± 1 of this interval, the power series Eqn
(B) becomes an alternating series which is seen to be convergent by Theorem
SEC 12-2 POWER SERIES / 557
12-7. Hence the interval of convergence of the integrated series Eqn (B) is
— 1 < x < 1. Using the fact that arctan 1 = £77, we find
frr=l-l+l-*+---
12-3 Taylor's theorem
So far we have discussed the convergence properties of a function/(x) which
is defined by a given power series. Let us now reverse this idea and enquire
how, when given a specific function /(x), its power series representation may
be obtained. Otherwise expressed, we are asking how the coefficients a n in
the power series
00
/(*)=!>«*" (12-11)
M =
may be determined when/(x) is some given function.
First, by setting x - 0, we discover that/(0) = a . Then, on the assump-
tion that the power series Eqn (12-11) has a radius of convergence r > 0,
differentiate it term by term to obtain
/'(*) = 2 nanx"' 1 , (12-12)
for —r<x<r.
Again setting x = shows that/'(0) = ai . Differentiating Eqn (12-12)
again with respect to x yields
CO
f(x) = 2 n(n - l)a„x»-2, (12-13)
from which we conclude /"(0) = 2\a%.
Proceeding systematically in this manner gives the general result
00
fm)( x ) = 2 m ( m _ i) . . . ( OT _ „ + l)a m x»-™, (12-14)
n = m
so that /<»>(()) = n\a„.
Thus the coefficients in power series Eqn (12-11) are determined by the
formula
/«»>(0)
an = — (12-15)
for n > 1 and ao = /(0).
Substituting these coefficients into Eqn (12-11) we finally arrive at the
power series
x 2 x n
f(x) =/(0) + xf (0) + - fiO) + ■ ■ • + -/<»>(0) + ■ • -. (12-16)
558 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
The expression on the right-hand side of this equation is known as the
Maclaurin series for fix), and it presupposes that f(x) is differentiable an
infinite number of times. To justify the use of the equality sign in Eqn (1216)
it is, of course, necessary to test the series for convergence to verify that its
radius of convergence r >• 0, and to show that \f{x) — S n {x) \ -*■ as n -* oo,
where S n (x) is the sum of the first n terms of the Maclaurin series. We shall
return to this matter later.
To transform Eqn (12-16) into a power series in (x — xq) we set x = xo
+ h and let/(x + h) = 4(h). Then <f>\h) =f'(x + h), <f,'(h) = /"(x + h),
. . ., 4 in) (h) = /<»>(x + h), . . .. It thus follows that <£<»>(0) =/<»>(*„) for
n > 1 and <£(0) = /(xo). The Maclaurin series for <j>(h) is
h 2 h n
<f>{h) = 4(0) + Af (0) + - ^"(0) + ... + _ 0<«)(O) + • • -,
or, reverting to the function/,
f{x) =f(x ) + (x- xo)f'(x Q ) + (X ~ v Xo)2 r(xo) + • • •
(x -Xq)"
+ f^f^Kxo) + ■ ■ : (12-17)
Expressed in this form the expression on the right-hand side is called the
Taylor 'series for/(x) about the point x = xo-
Example 12-13 Find the Maclaurin series for log (1 + x) and log (1 — x).
Deduce the expansion for log [(1 + x)/(l — x)].
Solution Setting /(x) = log (1 + x) we find
1 -1 (-l)"- 1 ^ - D!
f™ - T+? '"« " (TT# ' ' " /""W " (i + V
and so
/(»)(0) = (-l)»-i(«- 1)!
for n > 1 and /(0) = 0. Combining this expression for / (n) (0) with Eqn
(12-16) gives for the Maclaurin series for log (1 + x),
x 2 x^ x 4
log (1 + x) = x --+--- + •• -.
This has already been examined for convergence in Example 12-10 (b) and
found to be absolutely convergent in the interval — 1 < x < 1.
In the case of the function log (1 — x) the same argument shows that
/<»>(0) = -(« - 1)!
SEC 12-3 TAYLOR'S THEOREM / 559
for n > 1 and/(0) = 0, so that the Maclaurin series for log (1 — x) has the
form
log (1 — x) = —x • ■ •.
ev ' 2 3 4
This can readily be seen to have — 1 < x < 1 for its interval of convergence.
Using the fact that log {(1 + x)j{\ - x)} = log (1 + x) - log (1 - x)
gives the desired result
m-
■y*o v-5 v*
'°g(7-^J= 2 ^ + j + y + y + -
for — 1 < jc < 1.
Strictly speaking, we are not yet entitled to use the equality sign between
the function and its Maclaurin series, as we have not yet established the con-
vergence of the «th partial sum of the series to the function it represents. We
will do this later.
Example 12-14 Use Taylor's series to express the polynomial
P(x) = x 4 + 3x 3 + x 2 + 2x + 1
in terms of powers of (x — 1).
Solution To utilize the Taylor series in Eqn (12- 17) we must set xo = 1 and
f(x) = P(x). Then a simple calculation shows that
P(l) = 8, P'(l) = 17, P"(\) = 32, P'"(l) = 42, P<iv)(!) = 24 and
P ( »>(l) = 0forn>5.
Hence we arrive at the finite power series
P (x) = 8 + (x-l). 17 + ^=^. 32 + ^i^. 42 + ^=i>. 4 .24,
or
P( x ) = 8 + I7(x - 1) + 16(jc - l) 2 + l(x - l) 3 + (jc - I) 4 .
The use of the equality sign is fully justified here since we are dealing with a
finite power series.
It can happen that the derivatives of a function f(x) are not defined at
x = so that its formal Maclaurin series expansion cannot be obtained. In
this case, provided the function is infinitely differentiable at the point x = xo,
then/(X) may be expanded in a Taylor series about that point. Such a case is
discussed by the following simple example.
Example 12-15 Derive the nth derivative f {n) (x) of the function f(x) =
560 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
x log x, and show that / (n, (0) is not denned. Deduce the Taylor series
expansion of /(a;) about the point x = 1.
Solution Direct differentiation shows that/ (1) (x) = 1 + log x,/ (2, (x) = l/x,
/<3>( X ) = _i/ x 2 ; y<4)( x ) = 2!/x 3 ,/ <5, (x) = -3!/x 4 , . . ., and in general
(-l)«(n-2)!
for n ^ 2. Hence it is clear that/ (n) (0) is not defined for any n. However, the
numbers /<»>(1) are defined for all n and/ ( »»(l) = (-l)"(n - 2)! for n > 2
and/(l) = 0,/ (1) (l) = 1. The Taylor series for x log x can now be obtained
from Eqn (1217) by making the identification xo = 1 and then using the
derivatives / (n) (l) which have just been computed. We find
xlogx = (*-l) + — 2Y~ + ~TT~ " 4.5 + "
which is the desired result. Again, we have used the equality sign without first
showing that the «th partial sum of the Taylor series converges to x log x as
n -*■ oo.
Regarding this as a power series in the variable t = (x — 1) we find that
the coefficient a„ of the power t n is a„ = (-l)»/n(n - 1), whence the radius
of convergence
n(n + 1)
/• = lim
«-*0O
On
#n+l
= lim
n— >-°o
(« - 1)«
= 1.
The power series is thus absolutely convergent in the interval — 1 < t < 1
or, equivalently, in < x < 2. The series is convergent when x = 2, because
then it becomes an alternating series. It is also convergent when x = by
comparison with the series with the general term b„ = 1/n 2 . In fact we can do
better than this when x = 0, for then we can actually sum the series. Aside
from the first term, which becomes — 1, the sum of the remaining terms must
be + 1 by virtue of Example 12-9 (a), showing that if the equality sign may be
believed, then
lim (jc log x) s= 0.
X-+0
This is encouraging, because it is in agreement with the result which can be
obtained from Theorem 64 (b) by replacing x by l/x. This would strongly
suggest that our series is in fact equal to x log x in the complete interval of
convergence < x < 2.
We have attempted to emphasize that although we have indicated how a
Maclaurin or Taylor series may be associated with a function f(x) that is
infinitely differentiable, the general question of just exactly when the series
is equal to the function with which it is associated still remains open. To
S£ C 12-3 TAYLOR'S THEOREM / 561
indicate that an infinitely differentiable function need not be represented by
its Maclaurin series at more than a single point, despite the fact the series is
convergent for all x, we examine the function (see Problem 6-10)
(e- 1 /* 2 for x ^
fix) =
(0 for x = 0.
This function is easily seen to be infinitely differentiable, and to be such
that/ (n, (0) = for all n. The Maclaurin series for/(x) is thus
fix) = + + + • • •,
which is clearly convergent for all x, yet it is only equal to the function /(*)
at the single point x = 0. Such behaviour is quite exceptional, yet the fact
that it is associated with a seemingly simple function justifies the caution
with which we must approach the question of equality between a function
and its power series expansion.
On occasions, the computation of the nth derivative f n) ix) is simplified
by employing Leibnitz's theorem as we now illustrate.
Example 12-16 If fix) — cos (A: arccos x), and f {n) ix) denotes the nth
derivative of fix), show that
(1 - a: 2 )/<»+ 2 >(x) - (2« + l)xfi*+u(x) ~ (« 2 - A: 2 )/ (B, (*) = 0,
for n = 0, 1, . . ., where/ (0) (x) =f(x). Deduce the Maclaurin series for fix).
Solution As fix) = cos (£ arccos x), it follows by differentiation that
= k sin (lc arccos x) _ -k* cos jlc arccos x)
Vii-x 2 ) J w- { _ x2
xk sin ik arccos x)
+ (1 - x 2 ) 3 ' 2
A little manipulation shows that/(x) satisfies the differential equation
(1 - x*)f"ix) - xf'ix) + kjix) =
or,
(1 - x 2 )/<2>(*) - xf»ix) + kj^\x) = 0.
Now differentiating this equation n times, and using the symbolic differ-
entiation operator D, gives
D n [i\ - * 2 )/< 2 >(x) - xf»ix) + k 2 f m ix)] =
or,
£»[(! - * 2 )/ (2) (x)] - D«[xf l \x)] + /)"[jt 2 /«»(jc)] = 0.
Whence, employing Leibnitz's theorem (Theorem 5-16), this becomes
562 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
(1 - X 2 )/<»+ 2 >(;c) + iK-2jc)/<»+"(jc) + " ( " 1} (-2)f»\x)
-x/ ( » +1 >(x) - »/<»>(*) + k 2 f«\x) = 0,
showing that
(1 - x 2 )/<»+ 2 >(x) - (2« + l)x/ ( » +1) (x) - (b 2 - k 2 )f n) (x) = 0.
This is a differential equation, but setting x = it reduces to a recurrence
relation for / (M) (0):
/<»+ 2 >(0) = (b 2 - Jt 2 )/<«»(0) for b = 0, 1, 2, ....
As / (0 >(0) =/(0) = cos (k arccos 0) = cos (£)br) and / (1 >(0) =/'(0) =
k sin (k arccos 0) = k sin (^k-n), we have
/•<2>(0) = -k 2 f {0 K0) = -k 2 cos^,
/< 4) (0) = (2 2 - A; 2 )/ (2) (0) = -£ 2 (2 2 - k 2 ) cos — ,
/«.(0) - (4* - «/«><„) - -*p. - W - *, cos £.
and
/<3)(0) = (l 2 - k 2 )f a) (0) = &(1 2 - k 2 ) sin — ,
/<5)(0) = (3 2 - & 2 )/ <3 >(0) = k(l 2 - Ar 2 )(3 2 - /c 2 ) sin ^ ,
/ (7 >(0) = (5 2 - A: 2 )/ ,5) (0) = £(1 2 - A: 2 )(3 2 - £ 2 )(5 2 - k 2 ) sin — ,
and so on.
The general expressions are
^-,,(0, _ „,. _ W _*,... | (2m _ 3) . _ tn sin |,
^.,(0) _ _„ 2 * _ 4 , )( 4» - *, . . ,p. _ 2) , _ „ cos |,
from which we conclude that the Maclaurin series for cos (k arccos x) has
the form
rCTT rCTT JC rCTT
cos (k arccos x) = cos — + xk sin — — k 2 cos —
+ ^ A:(l 2 - A: 2 ) sin y - ^ & 2 (2 2 - A: 2 ) cos y + . . ..
SEC 12-3 TAYLOR'S THEOREM / 563
To make further progress it now becomes necessary for us to settle the
question of when a. Maclaurin or Taylor series is really equal to the function
with which it is associated. Let the function f(x) be infinitely differentiable
and have the Taylor series representation Eqn (12-17), and let P n -i(x) be
the sum of the first n terms of the series terminating at the power (x — xo)" -1 ,
so that
iVi(x) =/(xo) + (x - x )/(xo) + (X ~ 2 *° )2 nxo) +■ ■ ■ +
<* - soy- 1 r «-u (xo)
Then a necessary and sufficient condition that the Taylor series should,
converge to/(x) is obviously that
lim|/(x)-/Vi(x)| =0.
n— *-oo
This suggests that to establish convergence we must examine the behaviour
of the remainder of the series after n terms. To achieve this we now prove
Taylor's theorem, one form of which is stated below.
theorem 12T0 (Taylor's theorem with a remainder) Let/(x) be a function
which is differentiable n times in the interval a < x < 6. Then there exists a
number, |, strictly between a and b, such that
f{b) =f(a) + (b- «)/'(«) + ^-=^V( fl ) + • • •
Proof The proof of Taylor's theorem we now offer will be based on Rolle's
theorem. Let k be defined such that
f(b) =fia) + ib- a)f\a) +■ ■ ■ + ( - ^/'-"(fl) + ^—^-k,
and define the function Fix) by the expression
(b — vV»-i
Fix) =fib) -f{x) -ib- x)f'ix) L__L_/<»-i>(*)
(b - *)" ,.
Then Fib) = Fid) = 0, and a simple calculation shows that
564 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
Since, by hypothesis, /<»- 1, (x) is differentiable in a < x < b, the function
F(x) satisfies the conditions of Rolle's theorem, which asserts that there must
be a number £, strictly between a and b, for which F'{g) = 0. As a < f < 6,
the factor (ft - I)"- 1 ^ 0, so that we must have k =/<»>(!). This completes
the proof of Taylor's theorem.
If we identify b with x and a with x , Taylor's theorem with a remainder
takes the form
r x _ xn) n ~ l
f{x) =f(x ) + (x- xo)f(xo) + ■ • • + („__!), /'"""fro)
+ ( *~, X0) V ( "'(a (1218)
where x < f < x. For obvious reasons the last term of this expression is
called the remainder term and is usually denoted by R n (x). The form stated
here in which
R n(x ) = (X ~ ^" /""(f), (12-19)
«!
with x < I < x is known as the Lagrange form of the remainder term.
When Xo = Eqn (12T8) reduces to Maclaurin's theorem with a Lagrange
remainder,
y-2 x n_1
f(x) =/(0) + x/'(0) + -/»(0) + • • • + (7 — jy, /'- 1 (0)
+ £ T / (B, (a (12-20)
where < £ < x.
Example 12-17 Find the Lagrange remainders i? n (x) after « terms in the
Maclaurin series expansions of e x , sin x, and cos x. By showing that in each
case R n (x) -*■ as n ->■ oo, prove that these functions are equal to their
Maclaurin series expansions.
Solution lff(x) = e x , it is easily shown that Eqn (12-20) takes the form
X 2 X 3 X™ -1
e*=i+* + _ + _ + . • ■ + (— fyy + ^W,
where i?«(x) = (jc"/«!)e { , and < f < x. Now e* < e 1 *', and in connection
with Eqn (6-15) we proved that
x n X*" 1
— < — "- (i) n - R+l ,
n\ (R-iy.
SEC 12 ' 3 TAYLOR'S THEOREM / 565
where R is an integer greater than 2x. Hence for any fixed x, e^i is a finite
constant and *»/n! -»• as n -> oo. It follows from this that R n (x) ->■ as
n -> oo. This provides an alternative verification of the results of Section 6-1.
If fix) = sin x, then the Maclaurin series with a Lagrange remainder
Eqn (12-20) becomes
x 3 x 5 x n . / mr\
smx = x-- + --... + -s 1 n(£ + T ) >
where < | < x. The Lagrange remainder Eqn (12-19) is the last term
R n (x) = — sin If + —)■
Since | sin [| + (mr/2)] | < 1 we must have
I *»(*) I <
showing that R n (x) -► as n ^- oo. This establishes the convergence of sin x
to its Maclaurin series, and the argument for the cosine function is exactly
similar.
Example 12 18 Establish that log (1 + x) converges to its Maclaurin
series in the interval — 1 < x < 1.
Solution The Maclaurin series with a remainder is (see Example 12- 13)
-l) n -*x<
n- 1
. ,. , , x 2 x 3 x* (-1)"- 2 *"- 1
log(l + x) = x - - + j - - + • - - > + Rn{x)t
where the Lagrange remainder is
(- \) n ~ 1 x n
Rn{x) =
«(1 + l) w
with £ <x. For the interval <> x <, 1, we must have < | < 1 so that
1 + £ > 1, and hence (1 + #)» > 1. Thus | /?„(*) | < x»/n < l/«-*0 as
« -► oo, thereby proving convergence of the Maclaurin series to log (1 + x)
for 0^ x< 1.
We must proceed differently to prove convergence for the interval
— 1 < x < 0. Set y = — * and consider the interval < y < 1, in which
we may write
log(l + x) = log(l - y) = - I"" _^_.
Jo 1 — t
566 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
Using the identity
1 (— t) n
= 1 + / + t 2 + ■ • • + t^ 1 +
1 - t 1 - t
we have, after integration,
y 2 y 3 y n [" ( — t) n dt
log (1 - y) = -y 2 3
y n r«(-t)«d,
n Jo 1 — t
Thus our remainder term is now expressed in the form of the integral
rv t n
^) = (-l)»J or _dr.
Now,
*w>-frr-, d,< (rb)
ft/ yH+l
t n dt =
o (l-y)(n+l)
1
<
(1 - y)(n + 1)
so that | R n {y) | ->■ as n -»■ oo. This establishes convergence in the interval
— 1 < x < 0. Taken together with the first result we have succeeded in
showing that the Maclaurin series of log (1 + x) converges to the function
itself in the interval —1 < x < 1. This provides the justification for our
final result in Example 12- 13.
When performing numerical calculations with Taylor series, the remainder
term provides information on the number of terms that must be retained in
order to attain any specified accuracy. Suppose, for example, we wished to
calculate sin 31° correct to five decimal places by means of Eqn (12-18).
Then first we would need to set/(x) = sin x to obtain
, , , (x - Xo) 2 .
sin x = sin xo + (x — xo) cos xo — sin xo +
(x - x )"-i .
+ ~7 JTT sin I Xo +
(« - 1)!
where the remainder
—J + R n (x),
(x — x ) n .
Rn{x) = V -2- sin
«!
(( + T .
with xo < I < x.
As the arguments of trigonometric functions must be specified in radian
measure it is necessary to set x equal to the radian equivalent of 31° and then
to choose a convenient value for xo. We have 31° is equivalent to n/6 + 77-/I8O
radians, so that a convenient value for xo would be xo = 77/6. This is, of
SEC 12-3 TAYLOR'S THEOREM / 567
course, the radian equivalent of 30°. The remainder term R n (x) now becomes
^ )= (ifo)^ sin ( l +
whence
For our desired accuracy we must have | R n {x) \ < 5 x 10~ 6 . Hence n
must be such that
(JL.Y . I < 5 x 10 -b.
\180/ n\
A short calculation soon shows this condition is satisfied for n > 3, so that
the expansion need only contain powers as far as (x — xo) 2 .
The polynomial
iVi(x) =/(xo) + (x - xo)f'(xo) +■ ■ ■ + —. ^-/'"-"(Jfo)
(n — \)\
(12-21)
associated with Taylor's theorem as expressed in Eqn (12-18) is called a
Taylor polynomial of degree (n — 1) about the point x = xo. It is obviously
an approximating polynomial for the function /(x) in the sense that \f(x) —
Pn-i{x) | —>■ as n -»■ oo for all x within the interval of convergence. Hence
Pn-i(x) is strictly analogous to the nth partial sum used in the previous sec-
tion. By way of example, the Taylor polynomial Ps(x) for the exponential
function e* about the point x = is
P 3 (x) = 1 + x + X - + X - :
whilst its general Taylor polynomial P n (x) about the point x = is
P„(x) = 1 +
X 2
x + - + -
x n
Example 12 19
Evaluate the
integral
fO-2
'=1 '"*
' dx
by approximating e~ x2 by its Taylor polynomial P2(x) about the point x = 0.
Estimate the error involved in using this approximation.
Solution Setting f(x) = e - * 2 it is straightforward matter to show that
568 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
/(0) = 1, /'(0) = 0, /"(0) 2 and f"(x) = 4*(3 - 2x 2 ) e -* 2 . Hence
P 2 (x) = 1 — x 2 , and by Taylor's theorem with a remainder
e-^w + ^na
where < f < 0-2.
Now we have
ro-2 f*0-2
J-0-2 1*0-2
P 2 (x)dx = (1 - x 2 )dx = 01973,
o Jo
which is our approximate value for the integral. To assess the error E we use
the fact that
J-0-2 /*0-2
e-* 2 dx- P 2 (x)dx
Jo
/•0-2
= 1 (e~* 2 - P 2 (x))dx
In this expression, f = |(x), because < I < a: and x is itself integrated over
the interval' < x < 0-2. Although the functional form of £(x) is unknown,
we may obtain an overestimate of E by replacing /'"(f) by its greatest value
in the interval < x < 0-2. Using the fact that /'"(*) = 4x(3 - 2x 2 )e-*
and max | /"'(£) | = max, | /"'(*) | we estimate this latter quantity by assigning
to each of the three factors in/"'(;c) its maximum value. We find that
max | /'"(jc) | < 0-8. 3. 1,
whence
2-4 f 02
E < — x 3 dx =
~ 3! Jo
0002.
In many books Theorem 12-10 is called the generalized mean value
theorem, since when n = 1 it reduces to the already familiar mean value
theorem derived in Chapter 5 (Theorem 512). Let us now derive the analogue
of Taylor's theorem with a remainder for a function of two variables.
Suppose that/(x,j) has continuous partial derivatives up to those of
«th order, and consider the function
F(t) =f(a + ht,b + kt), (12-22)
in which a, b, h, and k are constants. Then F(t) = f(x, y), where x = a +
ht,y = b + kt, and in the neighbourhood of (a, b) we have
SEC 12-3
TAYLOR'S THEOREM / 569
dF_df _d£dx dfdy
dt ~ dt~ dx dt dy dt
dx oy
Write this result in the form
df I o 8\
where the expression in parentheses is a partial differential operator with
respect to x and y and is not a function. It only generates a function when it
acts on a suitably differentiable function/. In consequence, differentiating
r times, we have
/d\ r / d d\ r
by)
with the understanding that :
(12-23)
U± + k ±) f „ h v + k v
\ ox oy] dx oy
( h ± + k ±Y f=h >% + 2hk ^L +k2 %
\ dx oy) dx 2 dxdy dy 2
(h— + k—)f=h 3 %+-3h 2 k^- + 3hk 2
\ dx oy; dx 3 dx 2 dy dxdy 2
dy_ + kz dj
dy 3
Now F(0) =f(a, b), F(l) = /(a + h, b + k), and F(t) is differentiable n
times for < f < 1. Consequently, by applying Theorem 12-10 to the
function F(t) we obtain
F(l) = F(0) + F'(0) + - F"(0) +
+
1
in - 1)!
F<»-i»(0)
1
+ - F<«>(0, (12-24)
where < f < 1.
However, we also have
(d d\ r
:r=a
and
2/ = 6 + £A;
(12-25)
570 / SERIES, TAYLOR'S THEOREM AND ITS USES
CH 12
whence by substitution of Eqn (12-25) into Eqn (12-24) we obtain:
f{a + h,b + k) =f(a, b) + hf x {a, b) + kf y (a, b)
8x 8y
lid 8\ 2
+ •
+
x=a
-L-(
{n - 1)! I
8x
dy)
x—a
y=b
1 /, 8 8 \»
nA h Yx + k Yy)f
x=a + ih
(12-26)
where < | < 1.
This result is Taylor's theorem for a function f(x,y) of two variables and
it is terminated with a Lagrange remainder term involving «th partial de-
rivatives. The result is also often known as the generalized mean value theorem
for a function of two variables. In particular, by taking n = 1 we obtain the
result
f{a + h,b + k) =f(a, b) + hf x (a + £h,b + £k) + kf y (a + £h,b + £k),
(12-27)
where < | < 1. This is the two variable analogue of Theorem 5-12 to
which it obviously reduces when f = f(x), for then f y = 0. Result Eqn
(12-26) is of such importance that it merits stating in the form of a theorem.
theorem 12-11 (generalized mean value theorem in two variables) Let
fix, y) have continuous partial derivatives up to those of order n in some
neighbourhood of the point {a, b). Then if (x, y) is any point within this
neighbourhood,
+ (y-b) T )f +■■■
8y) (a,b)
1 / 8 3\«-i
+ ^^^ X - a) Fx + ( y-^8y) f
where the Lagrange remainder
1 / 8 8 \»
M X ,y) = - l ((x-a)- + (y-b)-)f
(a,b)
+ R n (x, y),
(v,0
in which rj = a + |(x — a), £ = b + !-(y — b), and < | < 1.
Example 12-20 Use the generalized mean value theorem in two variables
to expand the function
SEC 12-3 TAYLOR'S THEOREM / 571
f(x, y) = C x+2x »
about the point (0, 0). Terminate the expansion with the Lagrange remainder
term R3(x, y) and display its form.
Solution As the expansion is required about the point (0, 0) we must set
a = 0, b = in Theorem 12-11 and take n = 3. Routine calculation shows
that:
/(0, 0) = 1, f x (0, 0) = 1, f y (0, 0) = 0, f x J0, 0) = 1, f xy (0, 0) = 2,
fyy(0, 0) = 0,
whilst
fxx X (x,y) = {\ +2j)3e*+^,
fxxy(x, y) = 2(1 + 2y)[2 + x(l + 2y)] e*+**»,
fyyx(x,y) = 4x[2 + x(l + 2y)\ e^ 2 ^,
fyyy(x,y) = 8x*e*+2*V.
From Theorem 12-11 we find
e x+2xy = i + x + i x 2 + 2x y + R 3 (x,y),
where
Rz(x, y) = jj (x 3 /^^,^) + S^j/x-c^j) + 3xy% yx (x,y)
+ y z fyyy(x, y)\ n ,o
with »? = | x, £ = |y, and < f < 1.
12-4 Application of Taylor's theorem
The applications of Taylor's theorem with a remainder are so numerous that
we can do no more here than describe some of the most common. It is hoped
that these illustrations will indicate the power of this theorem and the fact
that its use is not confined exclusively to the estimation of errors in the series
expansion of functions.
12-4 (a) Indeterminate forms
The form of L'Hospital's rule given in Theorem 5-14 is capable of immediate
extension as follows.
theorem 1212 (extended L'Hospital's rule) Let/(x) and g(x) be n times
differentiable functions which are such that /(a) = g(a) = and/ (r, (a) =
g {r \a) = for r = 1, 2, . . ., n — 1, but X\m.f> n \x) and limg (K, (x) are not
both zero.
572 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
Then
lim/<»>(*)
ij 7W _ x^a
x ^a g(x) lim £<»>(*)'
x—*a
Proof Using Taylor's theorem with a remainder to expand numerator and
denominator separately gives
re + *) _ /w + yM + --- + ^/"«o _ / ,„ (ll)
* + »>%« + *■« + ... +£,*,„" »-■<«'
where a < f i < a + h, a < | 2 < a + A. If now h ->■ 0, then |i, f 2 -* a and
we obtain the result of the theorem
fin j. h\ H«n/ (n) W
™g(a + h) lim £<»>(*)'
Example 12-21 Find the value of the expression
a: sin*
hm — ; — ■
«^o(fl*- 1X6*- 1)
Solution This is an indeterminate form. Setting f(x) = x sin x, g(x) =
(ax _ x)(j,x _ i) ; we fi rs t compute /'(*) and g'{x). We find _/*'(*) = sinx
+ x cos 3c and g'(x) = a x log a(&* - 1) + b x log b(a x - 1), and clearly
lim/'(x) = limg'O) = 0. The earlier form of L'Hospital's rule thus fails,
and we must make appeal to Theorem 12-12 and compute f"(x) and g"{x).
We find f"(x) — 2 cos x — x sin x and g\x) — 2a x b x log a log b +
a x (\og a)Hb x — 1) + &*(log &) 2 (a* — 1), from which we see that \imf"(x) = 2,
x-*0
\img"(x) = 2 log a log 6. By the conditions of Theorem 12-12 we have
z—
x sin x 1
lim
.o {a x - l)(b x - 1) log a log b
12-4 (b) Local behaviour of functions of one variable
In Chapter 5 we repeatedly turned to the problem of the local behaviour of a
function of one variable in order to identify local maxima, local minima, and
points of inflection. Here again Taylor's theorem with a remainder helps to
identify such points when not only the first derivative, but also successive
higher order derivatives vanish at a point.
SEC 12-4 APPLICATIONS OF TAYLOR'S THEOREM / 573
Suppose that/(x) is n times differentiable near x = a and that/ (1) (a) =
y(8)( fl ) = . . . = y-(„-i, (a) = 0> but that y (B)(a) ^ Then by Taylor - s
theorem
/(« + A) =/(«) + hp»(a) + ■ ■ ■ + J^fn-v (a) + ^/(«)(|),
where a < g < a + h, but because of the vanishing of the first (n — 1)
derivatives at x = a this simplifies to
f(a + A) -f{a) = ^/ ( ">(f).
The behaviour of the left-hand side of this expression was used in Chapter
5 to identify the nature of the extrema involved so that we see its sign is now
determined solely by the sign of h n f <n) (£) or, for suitably small h, by the sign
of h n f (n) (a). It is left to the reader to verify that the following theorem is an
immediate consequence of this simple result when taken in conjunction with
Definition 5-4.
theorem 12-13 (identification of local extrema — one independent variable)
A necessary and sufficient condition that a suitably differentiable function
f(x) have a local max | mum at x _ a j s t h at t h e fl rst d er i va tive f in) (x) with
J { minimumj J v '
(f in) (a) < 0)
r, >/ s n \- If the
/«»>(a) > 0/
first derivative other than/ (1, (a) with a non-zero value at x — a is of odd
order, then f(x) has a point of inflection with an associated zero gradient
at x = a.
12-4 (c) Error estimate for Simpson's rule
In Chapter 7 it was shown that if
J"-x + h
f(x)dx,
XQ—h
then Simpson's rule for the approximate calculation of / was
I**\ (A*o - h) + 4/(xo) +/(xo + h)).
The error E(ji) is a function of the interval length h and by definition
fj fxo + h
E(.h) = -(f(xo-h) + 4f(x )+f(x + h))- f(x)dx.
J Jxo — h
Differentiating with respect to h and using Theorem 7-8 to differentiate
the integral which is a function of its upper limit gives
574 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12
E\h) = \ (f(x -h) + 4/(x ) + /(x + h)) + - (~f'(x - h) +
f'(x + h)) - (f(x + h) + /(x - h)),
whence £"(0) = 0.
Differentiating again yields
E"{h) = - 3 (f"(x + h) +f"(x -h))+ l - (f'(xo - h) -f'(x + h)),
whence £"(0) = 0.
Finally, one further differentiation gives
E'"{h) = h - (f'"(x + h) -f"(xo - h)).
Now setting n = 1 in Taylor's theorem with a remainder and applying
it to the function /"'(X) on the interval xo — h<x<xo + h gives
/'"(xo + h) =f(x -h) + 2hf*\£),
where xo — h < | < xo + h. Using this result in E'"(h) shows that
E"\h) = ^ff^)-
Now
Jo
E'"(t)dt = E"(h) - E"(0) = E"(h),
so that assigning to |/ (4) (f) I the maximum value M of |/ (4, (x) | in xo — h
< x < xo + h it follows that
f'2t 2 , 2h*M
A further integration using the fact that E'(0) = gives
m -L
h 2PM h*M
dt =
9 18
after which one final integration using the obvious fact that £(0) = yields
C h t^M , h 5 M
E(h) < dt = — — •
w — Jo 18 90
This is our desired error estimate, and as M = max |/ (4, (x) | for xo — h
< x < xo + h, it shows that contrary to expectation, Simpson's rule is
exact for any polynomial up to and including degree 3. This result is sur-
prising because Simpson's rule was based on the fitting of a quadratic at three
SEC 12-4 APPLICATIONS OF TAYLOR'S THEOREM / 575
equally spaced points.
Suppose for example that we desired to calculate
-f
sin x dx
o
using Simpson's rule with only three points. Then f{x) = sin x and / (4) (x)
= sin x, so that if M = max |/ (4) 0) I for < x < £tt, then M = 1/V2.
We have A = \tt, so that the error incurred
Efo) < (Jir)» ^ = 7-3 x 10-5.
12-4 (d) Newton's method
Newton's method is a simple and powerful method for the accurate deter-
mination of the roots of an equation /(x) = 0, and is based on Taylor's
theorem with the Lagrange remainder i?2(x).
Suppose xo is an approximate root of /(x) = and h is such that x =
xq + h is an exact root. Then by Taylor's theorem
/(xo + h) =/(*„) + hf(xo) + y/m
where xo < I < *o + h.
As, by supposition, f(xo + h) = we find
=f(x ) + hf'ixo) + -/"(£)■
Now £ is not known, but on the assumption that h is small we may define a
first approximation h\ to h by neglecting the third term and writing
* 1= ^
f'(xo)
The next approximation to the root itself must be xi = xo + hi, whence by
the same argument, the approximation h% to the correction needed to make
xi an exact root is
, /l*o + hi)
«2 = —
/'(xo + hi)
Proceeding in this manner we find that the nth approximation x n to the exact
root of/(x) = is, in terms of the (n — l)th approximation x n -i,
_ f(Xn-l)
Xn — X n — i — -•
/ (x„-i)
The successive calculation of improved approximations in this manner is
576 / SERIES, TAYLOR'S THEOREM AND ITS USES
CH 12
called iteration, and v« itself is called the «th iterate.
If the sequence ,.v«} tends to a limit .v*, it follows that this limit must be
the desired root, for then the numerator of the correction term vanishes. The
choice of an approximate root .vo with which to start the process may be
made in any convenient manner. The most usual method is to seek to show
that the root lies between two fairly close values x = a, x = b and then to
take for .yo any value that is intermediate between them. The numbers a, b
are usually found by direct calculation, which is used to prove that /(a) and
f(b) are of opposite sign, so that by the intermediate value theorem a zero of
y =f(x) must occur in the interval a < x < b.
The reasons for both the success and failure of Newton's method are
best appreciated ,in geometrical terms. The calculation of x n from x n -i
amounts to tracing back the tangent to the curve y =f(x) at x n -i until it
intersects the x-axis at the point x„. If x n lies between x n ~i and x* for all n
then the process converges; otherwise it diverges. Fig. 12-3 (a) illustrates a
convergent iteration and Fig. 12-3 (b) a divergent one.
A
y
y-A£-
4""""
'*?
X3
' Jti
O
A*
j^-or*
*
r f
j£-'
^
j||||gpjjjSiiaBE Si *-'~
(a) (b)
Fig. 12-3 (a) Convergent Newton iteration process; (b) divergent Newton iteration
process.
Example 12-22 Locate the real root of the cubic
x 3 + x 2 + 2x + 1 = 0.
Use the result to find the remaining roots.
Solution Setting f(x) = x 3 + x 2 + 2x + 1 we see that /(0) = 1 > and
/(— 1) = — 1 < 0, so that by the intermediate value theorem a root of the
equation f(x) = must lie in the interval — 1 < x < 0. Take xq = —0-5,
since this lies within the desired interval.
Now f'(x) = 3x 2 + 2x + 2 so that Newton's method requires us to
SEC 12-4 APPLICATIONS OF TAYLOR'S THEOREM / 577
employ the relation
_ -V ?t -i 3 + -Vn-i 2 + 2.V »-i + 1
X n — A'n-i — >
3.v»-i 2 + 2.v„-i + 2
starting with xo = —0-5.
A straightforward calculation shows that to four decimal places .vi =
-0-5714, .V2 = -0-5698, and x 3 = -0-5698. The iteration process has thus
converged to within the required accuracy in only three iterations. The real
root is x* = —0-5698, and the remaining two roots can now be found by
dividing/(.v) = by the factor (x + 0-5698) and then solving the remaining
quadratic in the usual manner. If this is done, long division gives
x 3 + x 2 + 2x + 1
x + 0-5698 = X2 + ' 4302 * + 1>7549 '
from which we find the other two roots are
-v = -0-2151 + i 1-3071 and x = -0-2151 - / 1-3071.
12-5 Applications of the generalized mean value
theorem
The applications of the extension of Taylor's theorem to functions of two or
more variables are perhaps even more extensive than those of Taylor's
theorem itself. This section illustrates a few of the simplest and most used,
connected mainly with functions of two variables. The final application,
connected with the least squares fitting of a polynomial, is the only one con-
cerning functions of more than two variables.
12-5 (a) Stationary points of functions of two variables
Consider the function z = f(x,y) of the two real independent variables x, y
which is defined in some region D of the (.v, j)-plane bounded by the curve y.
The notion of its graph is already familiar to us and it comprises a surface S
with points (x,y,f(x, y)), the projection of the boundary T of which onto
the (x, j)-plane is the curve y. A typical situation is shown in Fig. 12-4 (a, b)
where the point P is obviously a maximum and the point Q is obviously a
minimum.
Intuitively, and by analogy with the single variable case, it would seem
that all that is necessary to locate extrema such as P, Q is to find those points
(xo, jo) at which f x (x , j'o) = /y(xo, yo) = 0. This is, in effect, saying that the
tangent plane at either a maximum or a minimum must be parallel to the
(X 7) _ P lane - Unfortunately, this is not a sufficiently stringent condition, for
the point R in Fig. 12-5 is neither a maximum, nor a minimum, yet the tangent
plane at that point is certainly parallel to the (x, j)-plane. Because of the
shape of the surface it is called a saddle point. It is characterized by the fact
that if the surface is sectioned through R by different planes parallel to the
578 / SERIES, TAYLOR'S THEOREM AND ITS USES
CH 12
±f(x,y)
(a) x^ (b) x^
Fig. 12-4 (a) Surface having maximum at P; (b) surface having minimum at Q.
z-axis, then for some the curve of section has a minimum at R and for others
a maximum.
Each of these points P, Q, R is called a stationary point of the function
z =f(x,y) because f x and/,, vanish at these points.
definition 12-3 (stationary points of f(x,y)) Let f(x,y) be a differenti-
able function in some region of the (x, j)-plane. Then any point (xo, Jo) in
D for which f x (xo, yo) = and f y {xo, yo) = is called a stationary point of
the function f(x,y) in D.
If for all (x, y) near (x , yo) it is true that/(x, y) <f(x , yo), then/(x, y)
will be said to have a local maximum at (xo, yo)- If for all (x, y) near to (xo, yo)
it is true that/(x,j) >/(xo, j'o), then f(x,y) will be said to have a local
minimum at (xo, yo)- In the event that/(x, y) assumes values both greater
and less than /(xo, yo) for (x, y) near to a stationary point (xo, yo), then
/(x, j?) will be said to have a saddle point at (xo, Vo)-
We now use the generalized mean value theorem to prove the following
result.
theorem 12-14 (identification of extrema of f(x,y)) Letf(x,y) be a func-
tion with continuous fi